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Genetic Markers for Breast and Ovarian Cancer 



10 The research carried out in the subject application was supported in part by grants from 

the National Institutes of Health. The government may have rights in any patent issuing on this 
application. 

INTRODUCTION 

Fi*1H nf the. Invention 

1 5 The field of the invention is genetic markers for inheritable breast cancer susceptibility. 

Background 

The largest proportion of inherited breast cancer described so far has been attributed to 
a genetic locus, the BRCA1 locus, on chromosome 17q21 (Hall et al. 1990 Science 250:1684- 
1689; Narod et al. 1991 Lancet 338:82-83; Easton et al. 1993 Am J Hum Genet 52:678-701). 

20 Background material on the genetic markers for breast cancer screening is found in the Jan 29, 
1993 issue of Science, vol 259, especially pages 622-625; see also King et al., 1993 J Amer Med 
Assoc 269: 1975-198. Other relevant research papers include King (1992) Nature Genet 2: 1 25- 
126; Merctteet aL (1992) Amer J Human Genet 50:515-519; NIH/CEPH Collaborative Mapping 
Group (1992) Science 258:67-86. 

25 Risks of breast cancer to women inheriting the locus are extremely high, exceeding 50% 

before age 50 and reaching 80% by age 65 (Newman et al. 1988 Proc Nad Acad Sci USA 
85:3044-3048; Hall et al. 1992 Amer J Human Genet 50:1235-1242; Easton et al. 1993). 
Epidemiological evidence for inherited susceptibility to ovarian cancer is even stronger (Cramer 
et al. 1983 J Nati Cancer Inst 71:711-716; Schildkraut & Thompson 1988 Amer J Epidemiol 

30 128:456-466; Schildkraut et al. 1 989 Amer J Hum Genet 45:52 1 -529). According to one study, 
more than 90% of families with multiple relatives with breast and ovarian cancer trace disease 
susceptibility to chromosome 17q21 (Easton et al. 1993). 

The link between increasing risk of breast and ovarian cancer and inherited susceptibility 
to these diseases lies in the application of genetics to diagnosis and prevention. Creating 
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molecular tools for earlier diagnosis and developing ways to reverse the first steps of 
tumorigcnesis may be the most effective means of breast and ovarian cancer control. 

Our laboratory previously mapped the heritable breast cancer susceptibility gene locus 
(BRCA1 locus) to a 50 cM region of chromosome 17q (Hall et al. 1990). More recently, we 
5 developed new polymorphisms at ERBB2 (Hall and King 1991 Nucl Acids Res 19:2515), THRA1 
(Bowcock et aL 1993 Amer J Human Genet 52:718-722), EDH17B (Friedman et al. 1993 Hum 
Molec Genet 2:821), and multiple anonymous loci (Anderson et al. 1993 Genomics 17:616-623), 
ultimately developing a high density map of 17ql2-q21 (Anderson et al. 1993; see also, Simard 
et aL 1993 Human Mofec Genet 2:1193-1199). We also added families to the genetic study; there 
10 are now 100 families for whom transformed lymphocyte lines have been established and all 
informative relatives genotyped. We used our new markers and the many chromosome 17q 
polymorphisms developed in the past three years to test linkage in our families, refining the region 
first to 8 cM (Hall et al. 1992), then to 4 cM (Bowcock et al, 1993), then to 1 Mb based on 
polymorphisms from our high density map (Anderson et al. 1993; see also Flejter et al., 1993 
15 Genomics 17:624-631). We disclose here a number of mutations in BRCA1 which correlate with 
disease. 

Rftteyant l iterature 

The predicted amino acid sequence for aBRCAl cDNA and familial studies of this gene 
were described by Miki et al. (1994) Science 266, 66-71 and Futeal et al. (1994) Science 266. 
20 120-122. A study of Canadian cancer families is described in Simard et al. (1994) Nature 
Genetics 8, 392-398. A collaborative survey of BRCA1 mutations is described in Shattuch- 

Eidens et al. (1995) JAMA 273, 535-541. 

SUMMARY OF THE INVENTION 
The invention discloses methods and compositions useful in the diagnosis and treatment 

25 of breast and ovarian cancer associated with mutations and/or rare alleles of BRCA1, a breast 
cancer susceptibility gene. Specific genetic probes diagnostic of inheritable breast cancer 
susceptibility and methods of use are provided. Labelled nucleic acid probes comprising 
sequences complementary to specified BRCA1 alleles are hybridized to clinical nucleic acid 
samples. Linkage analysis and inheritance patterns of the disclosed markers are used to diagnose 

30 genetic susceptibility. In addition, BRCA1 mutations and/or rare alleles are directly identified by 
hybridization, polymorphism and or sequence analysis. In another embodiment, labeled binding 
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agents, such as antibodies, specific for peptides encoded by the subject nucleic acids are used to 
identify expression products of diagnostic mutations or alleles in patient derived fluid or tissue 
samples. For therapeutic intervention, the invention provides compositions which can functionally 
interfere with the transcription or translation products of the breast and ovarian cancer 
5 susceptibility associated mutations and/or rare alkies within BRCA1. Such products include anti- 
sense nucleic acids, competitive peptides encoded by the subject nucleic acids, and high affinity 
binding agents such as antibodies, specific for e.g. translation products of the disclosed BRCA1 
mutations and alleles. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
10 We disclose here methods and compositions for determining the presence or absence of 

BRCA1 mutations and rare alleles or translation products thereof which are useful in the diagnosis 
of breast and ovarian cancer susceptibility. Tumorigenic BRCA1 alleles include BRCA1 allele 
#5803 (SEQ ID NO:l), 9601 (SEQ ID NO:2), 9815 (SEQ ID NO:3), 8403 (SEQ ID NO:4), 
8203 (SEQ ID NO:5), 388 (SEQ ID NO:6), 6401 (SEQ ID NO:7), 4406 (SEQ ID NO:8), 10201 
15 (SEQ ID NO:9), 7408 (SEQ ID NO: 10), 582 (SEQ ID NO: 1 1 ) or 77 (SEQ ID NO: 12). These 
nucleic acids or fragments capable of specifically hybridizing with the corresponding allele in the 
presence of other BRCA1 alleles under stringent conditions find broad diagnostic and therapeutic 
application. Gene products of the disclosed mutant and/or rare BRCA1 alleles also find a broad 
range of therapeutic and diagnostic applications. For example, mutant and/or rare allelic BRCA1 
20 peptides are used to generate specific binding compounds. Binding reagents are used 
diagnostically to distinguish non-tumorigenic wild-type and tumorigenic BRCA1 translation 
products. 

The subject nucleic acids (including fragments thereof) may be single or double stranded 
and are isolated, partially purified, and/or recombinant. An "isolated" nucleic acid is present as 
25 other than a naturally occurring chromosome or transcript in its natural state and isolated from 
(not joined in sequence to) at least one nucleotide with which it is normally associated on a natural 
chromosoire; a partially pure nucleic acid constitutes at least about 10%, preferably at least about 
30%, and more preferably at least about 90% by weight of total nucleic acid present in a given 
fraction; and a recombinant nucleic acid is joined in sequence to at least one nucleotide with which 
30 it is not normally associated on a natural chromosome. 

Fragments of the disclosed alleles are sufficiendy long for use as specific hybridization 
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probes for detecting endogenous alleles, and particularly to distinguish the disclosed critical rare 
or mutant alleles which correlate with cancer susceptibility from other BRCA1 alleles, including 
alleles encoding the BRCA1 translation product displayed in Miki et al (1994) supra, under 
stringent conditions. Preferred fragments are capable of hybridizing to the corresponding mutant 
allele under stringency conditions characterized by a hybridization buffer comprising 0% 
formamide in 0.9 M saline/0.09 M sodium citrate (SSC) buffer at a temperature of 37°C and 
remaining bound when subject to washing at 42°C with the SSC buffer at 37°C. More preferred 
fragments will hybridize in a hybridization buffer comprising 20% formamide in 0.9 M saline/0.09 
M sodium citrate (SSC) buffer at a temperature of 42°C and remaining bound when subject to 
washing at 42°C with 2 X SSC buffer at 42°C. In any event, the fragments are necessarily of 
length sufficient to be unique to the corresponding allele; i.e. has a nucleotide sequence at least 
long enough to define a novel oligonucleotide, usually at least about 14, 16, 18, 20, 22, or 24 bp 
in length, though such fragment may be joined in sequence to other nucleotides which may be 
nucleotides which naturally flank the fragment. 

In many applications, the nucleic acids are labelled with directly or indirectly detectable 
signals or means for amplifying a detectable signal. Examples include radiolabels, luminescent 
(e.g. fluorescent) tags, components of amplified tags such antigen-labelled antibody, biotin-avidin 
combinations etc. The nucleic acids can be subject to purification, synthesis, modification, 
sequencing, recombination, incorporation into a variety of vectors, expression, transfection, 
administration or methods of use disclosed in standard manuals such as Molecular Cloning, A 
Laboratory Manual (2nd Ed., Sambrook, Fritsch and Maniatis, Cold Spring Harbor), Current 
Protocols in Molecular Biology (Eds. Aufubel, Brent, Kingston, More, Feidman, Smith and Stuhl, 
Greene Publ. Assoc., WUey-Interecience, NY, NY, 1992) or that are otherwise known in the art. 

The subject nucleic acids are used in a wide variety of nucleic acid-based diagnostic 
method that are known to those in the art. Exemplary methods include their use as allele-specific 
oligonucleotide probes (ASOs), in ligase mediated methods for detecting mutations, as primers 
in PCR-based methods, direct sequencing methods wherein the clinical BRCA1 nucleic acid 
sequence is compared with the disclosed mutations and rare alleles, etc. The subject nucleic acids 
are capable of detecting the presence of a critical mutant or rare BRCA1 allele in a sample and 
distinguishing the mutant or rare allele from other BRCA1 alleles. For example, where the subject 
nucleic acids are used as PCR primers or hybridization probes the subject primer or probe 



4 



WO 96/33271 PCT/US96/05621 

comprises an oligonucleotide complementary to a strand of the mutant or rare allele of length 
sufficient to selectively hybridize with the mutant or rare allele. Generally, these primers and 
probes comprise at least 16 bp to 24 bp complementary to the mutant or rare allele and may be 
as large as is convenient for the hybridizations conditions. 

5 Where the critical mutation is a deletion of wild-type sequence, useful primers/probes 

require wild-type sequences flanking (both sides) the deletion with at least 2, usually at least 3, 
more usually at least 4, most usually at least 5 bases. Where the mutation is an insertion or 
substitution which exceeds about 20 bp, it is generally not necessary to include wild-type 
sequence in the probes/primers. For insertions or substitutions of fewer than 5 bp, preferred 

10 nucleic acid portions comprise and flank the substitution/insertion with at least 2, preferably at 
least 3, more preferably at least 4, most preferably at least 5 bases. For substitutions or insertions 
from about 5 to about 20 bp, it is usually necessary to include both the entire insertion/substitution 
and at least 2, usually at least 3. more usually at least 4, most usually at least 5 basis of wild-type 
sequence of at least one flank of the substitution/insertion. 

1 5 In addition to their use as diagnostic genetic probes and primers, BRC A 1 nucleic acids are 

used to effect a variety of gene-based therapies. See, e.g. Zhu et al. (1993) Science 261, 209-21 1; 
Gutierrez et al. (1992) Lancet 339, 715-721; Gary Nabel lab (Dec 1993), Proc. Nat'l. Acad Sci 
USA. For example, therapeutic nucleic acids are used to modulate cellular expression or 
intracellular concentration or availability of a tumorigenic BRCA1 translation product by 

20 introducing into cells complements of the disclosed nucleic acids. These nucleic acids are 
typically antisense: single-stranded sequences comprising complements of the disclosed relevant 
BRCA1 mutant. Antisense modulation of the expression of a given mutant may employ antisense 
nucleic acids operably linked to gene regulatory sequences. Cell are transfected with a vector 
comprising such a sequence with a promoter sequence oriented such that transcription of the gene 

25 yields an antisense transcript capable of binding to the endogenous tumorigenic BRCA 1 allele or 
transcript. Transcription of the antisense nucleic acid may be constitutive or inducible and the 
vector may provide for stable extrachromosomal maintenance or integration. Alternatively, 
single-stranded antisense nucleic acids that bind to BRCA1 genomic DNA or mRNA may be 
administered to the target cell, in or temporarily isolated from a host, at a concentration that 

30 results in a substantial reduction in expression of the targeted translation product. 

Various techniques may be employed for introducing of the nucleic acids into viable cells. 
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The techniques vary depending upon whether one is using the subject compositions in culture or 
in vivo in a host. Various techniques which have been found efficient include transfection with 
a retrovirus, viral coat protein-liposome mediated transfection, see Dzau et al.. Trends in Biotech 
11, 205-210 (1993). In some situations it is desirable to provide the nucleic acid source with an 
agent which targets the target cells, such as an antibody specific for a surface membrane protein 
on the target cell a ligand for a receptor on the target cell, etc. Where liposomes are employed, 
proteins which bind to a surface membrane protein associated with endocytosis may be used for 
targeting and/or to facilitate uptake, e.g. capsid proteins or fragments thereof tropic for a 
particular cell type, antibodies for proteins which undergo internalization in cycling, proteins that 
target intracellular localization and enhance intracellular half-life. In liposomes, the decoy 
concentration in the lumen will generally be in the range of about 0.1 uM to 20 uM. For other 
techniques, the application rate is determined empirically, using conventional techniques to 
determine desired ranges. Usually, application of the subject therapeutics will be local, so as to 
be administered at the site of interest. Various techniques can be used for providing the subject 
compositions at the site of interest, such as injection, use of catheters, trocars, projectiles, pluronic 
gel, stents, sustained drug release polymers or other device which provides for internal access. 
Systemic adininistration of the nucleic acid using lipofection, liposomes with tissue targeting (e.g. 

antibody) may also be employed. 

The invention also provides isolated translation products of the disclosed BRCA1 allele 
which distinguish the wild type BRCA1 gene product. For example, for alleles which encode 
truncated tumorigenic transladon product, the C-terminus is used to differentiate wild-type 
BRCA1. Accordingly, the invention provides the translation product of BRCA1 allele #5803 
(SEQ ID NO:13), 9601 (SEQ ID NO:14), 9815 (SEQ ID NO: 15), 8203 (SEQ ID NO:17), 388 
(SEQ ID NO:18), 6401 (SEQ ID NO:19), 4406 (SEQ ID NO:20), 10201 (SEQ ID NO:21), 7408 
(SEQ ID NO:22), 582 (SEQ ID NO:23) or 77 (SEQ ID NO:24), or a C-terminus fragment 
thereof; and that of #8403 (SEQ ID NO: 16), or a fragment thereof comprising Gly at position 61 . 

The subject mutant and/or rare allelic BRCA1 translation products comprise an amino acid 
sequence which provides a target for distinguishing the product from that of other BRCA1 alleles. 
Preferred fragments are capable of eliciting the production of a peptide-specific antibody, in vivo 
or in vitro, capable of distinguishing a protein comprising the immunogenic peptide from a wild- 
type BRCA1 translation product. The fragments are necessarily unique to the disclosed allele 
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translation product in that it is not found in any previously known protein and has a length at least 
long enough to define a novel peptide, from about 5 to about 25 residues, preferably from 6 to 
10 residues in length, depending on the particular amino acid sequence. 

The subject translation products (including fragments) are either isolated, i.e. 

5 unaccompanied by at least some of the material with which they are associated in their natural 
state); partially purified, i.e. constituting at least about 1%, preferably at least about 10%, and 
more preferably at least about 50% by weight of the total translation product in a given sample; 
or pure, Le. at least about 60%, preferably at least 80%, and more preferably at least about 90% 
by weight of total translation product. Included in the subject translation product weight are any 

10 . atoms, molecules, groups, etc. covalendy coupled to the subject translation products, such as 
detectable labels, glycosylations, phosphorylations, etc. The subject translation products may be 
isolated, purified, modified or joined to other compounds in a variety of ways known to those 
skilled in the art depending on what other components arc present in the sample and to what, if 
anything, the translation product is covalendy linked. 

15 Binding agents specific for the disclosed tumorigenic BRCA1 genes and gene products 

find particular use in cancer diagnosis. The selected method of diagnosis will depend on the 
nature of the tumorigenic BRCA1 mutants/rare allele and its transcription or translation 
product(s). For example, soluble secreted translation products of the disclosed alleles may be 
detected in a variety of physiologic fluids using a binding agent with a detectable label such as a 

20 radiolabel, fluorescer etc. Detection of membrane bound or intracellular products generally 
requires preliminary isolation of cells (e.g. blood cells) or tissue (e.g. breast biopsy tissue). A 
wide variety of specific binding assays, e.g. ELBA, may be used 

BRCA1 gene product-specific binding agents are produced in a variety of ways using the 
compositions disclosed herein. For example, structural x-ray crystallographic and/or NMR data 

25 of the mutant and/or rare allelic BRCA1 translation products are used to rationally design binding 
molecules of determined structure or complementarity. Also, the disclosed mutant and/or rare 
allelic BRCA1 translation products are used as immunogens to generate specific polyclonal or 
monoclonal antibodies. See, Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold 
Spring Harbor Laboratory, for general methods. Specific antibodies are readily modified to a 

30 monovalent form, such as Fab, Fab', or Fv. 

Other mutant and/or rare allelic BRCA1 gene-product specific agents are screened from 
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large libraries of synthetic or natural compounds. For example, numerous means are available for 
random and directed synthesis of saccharide, peptide, and nucleic acid based compounds. 
Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal 
extracts are available or readily producible. Additionally, natural and synthetically produced 
libraries and compounds are readily modified through conventional chemical, physical, and 
biochemical means. See, e.g. Houghten et al. and Lam et al (1991) Nature 354. 84 and 81, 
respectively and Blake and Litzi-Davis (1992), Bioconjugate Chem 3, 5 10. 

Useful binding agents are identified with assays employing a compound comprising mutant 
and/or rare allelic BRCA1 peptides or encoding nucleic acids. A wide variety of in vitro, cell-free 
binding assays, especially assays for specific binding to immobilized compounds comprising the 
subject nucleic acid or translation product find convenient use. See, e.g. Fodor et al (1991) 
Science 25 1, 767 for the light directed parallel synthesis method. Such assays are amenable to 
scale-up, high throughput usage suitable for volume drug screening. 

Useful agents are typically those that bind the targeted mutant and/or rare allelic BRCA1 
gene product with high affinity and specificity and distinguish the tumorigenic BRCA1 
mutants/rare alleles from the wild-type BRCA1 gene product. Candidate agents comprise 
junctional chemical groups necessary for structural interactions with proteins and/or DNA, and 
typically include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two 
of the functional chemical groups, more preferably at least three. The candidate agents often 
comprise cyclical carbon or heterocyclic structures and/or aromatic or polyaromatic structures 
substituted with one or more of the forementioned functional groups. Candidate agents are also 
found among biomolecules including peptides, saccharides, fatty acids, sterols, isoprenoids, 
purines, pyridines, derivatives, structural analogs or combinations thereof, and the like. Where 
the agent is or is encoded by a transfected nucleic acid, said nucleic acid is typically DNA or 
RNA. 

Candidate agents are obtained from a wide variety of sources including libraries of 
synthetic or natural compounds. For example, numerous means are available for random and 
directed synthesis of a wide variety of organic compounds and biomolecules, including expression 
of randomized oligonucleotides. Alternatively, libraries of natural compounds in the form of 
bacterial fungal, plant and animal extracts are available or readily produced. Additionally, natural 
and synthetically produced libraries and compounds are readily modified through conventional 
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chemical, physical, and biochemical means to enhance efficacy, stability, pharmaceutical 
compatibility, and the like. In addition, known pharmacological agents may be subject to directed 
or random chemical modifKations, such as acylation, alkylation, esterification, amidification, etc., 

to produce structural analogs. 

5 Therapeutic applications typically involve binding to and functional disruption of a 

tumorigenic BRCA1 gene product by an administered high affinity binding agent. For therapeutic 
. uses, the compositions and agents disclosed herein may be administered by any convenient way. 
Small organics are preferably administered orally; other compositions and agents are preferably 
artninistered parenterally, conveniently in a pharrnaceutically or physiologically acceptable carrier, 

10 e.g., phosphate buffered saline, or the like. Typically, the compositions are added to a retained 
physiological fluid such as blood or synovial fluid. Generally, the amount administered will be 
empirically determined, typically in the range of about 10 to 1000 ug/kg of the recipient. For 
peptide agents, the concentration will generally be in the range of about 50 to 500 ug/ml in the 
dose administered. Other additives may be included, such as stabilizers, bactericides, etc. These 

1 5 additives will be present in conventional amounts. 

The following examples are offered by way of illustration and not by way of limitation. 

1 

EXAMPLES 

Example 1. Positirm a1 Honing 

20 C f >nH f consnuctifliL 

VACs . primers flanking polymorphic repeats in the 4 Mb region of linkage were used to amplify 
pools from the CEPH, Washington University, and CEPH megaYAC libraries available. 39 YACs 
were selected. Of these, 23 were tested for chimerism by FISH and 12 found to be chimeric. 
YACs were aligned to each other by attempting to amplify each YAC with primer pairs from 

25 known sequence tagged sites (STSes). More STSes were defined by sequencing the ends of 
YACs, and these new STSes used for further alignment and YAC identification. 
Cosmids . A gridded cosmid library of chromosome 17 was prepared. Alu-Alu PCR products of 
YACs were hybridized to the cosmid grids and positively hybridizing cosmids used for subsequent 
studies. Contigs were constructed in two ways. Cosmids with the same restriction patterns were 

30 aligned; and, the unique sequences flanking polymorphic markers and our sequenced cDNAs were 
used as STSes. 
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Physical mapping fry P» k * ri fie1f 1 8fiJ "'^phoresis. Physical distances were estimated 
by pulsed field gel electrophoresis, using DNA from lymphocyte cell lines of BRCAl-linked 
patients and of controls. DNA samples were digested with NotI, MM. RsrII, Nrul, SacD, and 
EclXI. Filters were probed with single-copy sequences isolated from cosmids and later with 
cDNA clones. Multiple unrelated linked patients and controls were screened to detect large 
insertions or deletions associated with BRCA1. Results of PFGE were used to define the region 
first used to screen cDNA libraries as -1 Mb and the current linked region as ^ 500 kb. 

^ rffn i P?^ nNA1 ' hraries - We tegan library screening when the linked region defined by 
meiotic recombination was -1 Mb. The first question was what library would optimize the length 
of cDNA clones, representation of both 5' and 3' ends of genes, and the chances that BRCA1 
would be expressed. We chose to use a random primed cDNA library cloned into lgtlO from 
cultured (not transformed) fibroblasts from a human female. This library was selected because 
it had inserts averaging 1.8 kb. with 80% of inserts between 1 and 4 kb, was contracted from 
cultured fibroblasts known to be "leaky" in gene expression, and was known to include 5' ends 
of genes. We simultaneously screened three other libraries (from ovary, fetal brain, and mouse 
iriammary epithelium). With one exception (described below), all transcripts from these libraries 
cross-hybridized to transcripts from the fibroblast library. 

The fibroblast library was screened with YAC DNA isolated by PFGE. Pure YAC DNA 
(100 nanograms) was random primed with both aP32-dATP (6000mCi/mmole) and 32 P-dCTP 
(3000 mCi/mmole), and used immediately after labelling. Filters from the library were 
prehybridized with human placental DNA for 24-48 hours. Labelled YAC DNA was hybridized 
to the filters for 48 hours at 65C. Approximately 250 transcripts were selected by screening with 
7 YACs and then ross-hybridized. We also used pools of cosmids from the linked region to screen 
the fibroblast library. We selected 122 transcripts and cross-hybridized them to clones previously 
detected by the YACs. 

Exam ple 2. c wiinp BRCA1 and its characterization- 

A Screening for mntatinm in ca nHiHate genes . We initially identified 24 genes in the 1Mb 
BRCA1 region defined by meiotic recombination, respective locations on the YAC contig, sizes 
of representative cDNA clones, numbers of replicates in the library, sizes of transcripts, 
homologies to known genes, and variants detected. Candidate gene were characterized in the 
following ways: 
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(1) r^-hy^Hirin, clones . cDNA clones isolated from the library are hybridized against each 
other. Cross-hybridizing clones are considered "siblings" of the clone used as a probe and 
represent the same gene. 

(2) M«pp i"g back. At least one clone from each sibship is mapped back to total human genomic 
DNA, to cosmids, to YACs, and to somatic cell hybrid lines, some of which contain deletions of 
17q and one of which has chromosome 17 as its only human chromosome. 

(3) Subfilonio g ™* ™™c\™ . One of the longest clones from each sibship is subcloned into 
M13 and sequenced manually by standard methods, constructing new primers at the end of each 
fragment to continue sequencing until the end of the clone is reached. 

(4) F,r-^"f^T^with s ibs. In order to find clones that contain more of the gene, the last 
sequencing primer for the clone and primers made from AgtlO are used to amplify sibs of the first 
clone. Sibs that amplify the longest fragments are selected, subcloned, and sequenced. This 
process is continued until we reach the size of the transcript defined by Northern blot and/or until 
the 3* sequence is a polyA tail and the 5' sequence has features of the beginning of the coding 
region. 

(5) Southerns . To identify insertion or deletion mutations, genomic DNA from 20 unrelated 
patients from families with breast cancer linked to 17q (i.e. "linked patients") and controls are 
digested with Baml/TaqI and independendy with HindlWHinfl. Each cDNA clone is used to 
screen Southern blots. Variants have been detected in two genes. Both of these variants are 
RFLPs, occuring in equal frequency in linked patients and in controls. 

(6) Northerns. To identify splice mutations and/or length mutations, we prepared total RNA and 
poh/A+ RNA from germline DNA (from lymphoblast lines) of 20 unrelated linked patients, from 
ovarian and breast tissues, from fibroblasts, from a HeLa cell line, and from breast cancer cell 
lines. Northern blots are screened with each gene. 

(7) rvt^tinn of small mutations . To screen for germline point mutations in patients without 
encountering introns, we prepared cDNA from poly-A+ mRNA from lymphoblast cell lines of 20 
unrelated linked patients and from controls. cDNA has also been made from 65 malignant ovarian 
cancers from patients not selected for family history. Primers are constructed every -200 
basepairs along the sequence and used to amplify these cDNAs. Genomic DNA has also been 
prepared from cell lines from all family members (linked and unlinked), from malignant and normal 
cells from paraffin blocks from their breast and ovarian surgeries, and from malignant and normal 
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cells from 29 breast tumors not selected for family history. For sequences without introns, cDNA 
and gDNA lengths are equal, and the gDNA samples are amplified as well. 

Two mutation detection methods are used to screen each sequence. Amplified products 
are screened for SSCPs using modifications that enable electrophoresis to be done with only one 
set of running conditions (Keen etaL 1991 Trends Genet 7:5; Soto and Sukumar 1992PCRMeth 
Appl 2:96-98). In order to screen longer segments of DNA (100-1500 bp) and to detect variants 
missed by SSCP, sequences are also screened for point mutations by CCM (Cotton 1993 
Mutation Res 285: 125-144) using essentially the protocol of Grompe et al. 1989 Proc Natl Acad 
Sci USA 86:5888-5892. An endonuclease developed for mismatch detection reduces the toxicity 
of the method (Youil et al. 1993 Amer J Hum Genet 53 (supplement): abstract 1257). 
(8) f n ly m " T h "" nr nation. Variants are screened in cases and controls to distinguish 
polymorphisms from a critical mutation. Linkage of breast cancer to each variant is tested in all 
informative families. 

Eaa pfc ^ rh ara c t f da ™»™ 1 mutations in mrnlinr DNA n nd hmwt r a m r r paiie ms tumors . 

A yprAi n.^tinnn in rhmnTT^TTr iy^H families. Our series of families includes 
20 large extended kindreds in which breast and ovarian cancer (and in one family prostatic cancer) 
are linked to 17q21, with individual lod scores > 1.5. Since linked patients in these families carry 
mutations in BRCA1, we have identified their mutations first. 



Family 


Exon 


U 14680 nt 


Mutation 


Amino Acid 
change 


Predicted 
effect 


5803 


3 


200-253 


exon 3 deleted (54 bp) 


27 Stop 


protein 
truncation 


9601 


3 


230 


deletion AA 


39 Stop 


protein 
truncation 


9815 


Intron 5 


splice donor, 
bp+1 


substitution G to A 
->22 bp deletion in RNA 


64 Stop 


protein 
truncation 


8403 


5 


300 


substitution T to G 


Cys61 Gly 


lose zinc- 
binding 
motif 


8203 


Intron 5 


splice 

acceptor, bp 
-11 


substitution T to G 
->59 bp insertion of 
intron intn RNA 


81 Stop 


protein 
truncation 



12 



WO 96/33271 



PCI7US96/05621 



5 



388 


11 


1048 


deletion A 


313 Stop 


protein 
truncation 


6401 


11 


2415 


deletion AG 


Ser 766 Stop 


protein 
truncation 


4406 


11 


2800 


deleiton AA 


901 Stop 


protein 
truncation 


10201 


11 


2863 


deletion TC 


Ser 915 Stop 


protein 
truncation 


7408 


11 


3726 


substitution C to T 


Arg 1203 


protein 
truncation 


582 


11 


4184 


deletion TCAA 


1364 Stop 


protein 
truncation 


77 


24 


5677 


Insertion A 


Tyr 1853 
Stop 


protein 
truncation 



r. nermiine EfRr a i mutati o n a mnn g hreast cancer patients in the genera l population . 



From each breast cancer patient, not selected for family history, a 30 ml sample of whole 
10 blood is drawn into acid citrate dextrose. DNA from the blood is extracted and stored at -70C 
in 3 aliquots. Germline mutations in BRCA1 are identified using the approaches described above 
and by directly sequencing new mutations. Paraffin-embedded tumor specimens from the same 
patients are screened for alterations of p53, HER2, PRAD1, and ER. Germline BRCA1 
mutations are tested in the tumor blocks. 
15 A preliminary estimate of risk associated with different BRCA1 mutations is obtained from 

relatives of patients with germline alterations. For each patient with a germline BRCA1 mutation, 
each surviving sister and mother (and for older patients, brothers as well), DNA is extracted from 
a blood sample and tested for the presence of the proband's BRCA 1 mutation. To ascertain men 
at risk of prostatic cancer, brothers of breast cancer patients diagnosed after age 55 are also 
20 interviewed and sampled. Paraffin blocks from deceased relatives who had cancer are also 
screened. The frequency of breast, ovarian, or prostatic cancer among relatives carrying BRCA1 
mutations is a first estimate of risk of these cancers associated with different mutations. 

Somatic alterations of BRCA1 in breast tumors, 

Malignant cells are dissected from normal cells from paraffin blocks. By identifying 
25 BRCA1 mutations in these series, we estimate the frequency of somatic BRCA1 alterations, 
determine BRCA1 mutations characteristic of any particular stage of tumor development, and 
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evaluate their association with prognosis. 

p C haracifiriziDS mutam and nir alkies of BRCAl Mutant or rare BRCAl allele 
function and pattern of expression during development are characterized using transformed cells 
expressing the allele and knockout or transgenic mice. For example, phenotypic changes in the 
animal or cell line, such as growth rate and anchorage independence are determined. In addition, 
several methods are used to study loss-of-function mutations, including replacing normal genes 
with their mutant alleles (BRCA1-/BRCA1-) by homologous recombination in embryonic stem 
(ES) cells and replacing mutant alleles with their normal counterparts in differentiated cultured 
cells (Capecchi 1989 Science 244:1288-1292; Weissman et al. 1987 Science 236: 175-180; Wang 
et al. 1993 Oncogene 8:279-288). Breast carcinoma cell lines are screened for mutation at the 
BRCAl locus and a mutant BRCAl line is selected. Normal and mutant cDNAs of BRCAl are 
subcloned into an expression vector carrying genes which confer resistance to ampicillin and 
geneticin (Baker etai 1990 Nature 249:912-915). Subclones are transfected into mutant BRCAl 
breast cancer cells Geneticin-resistant colonies are isolated and examined for any change in 
tumorigenic phenotype, such as colony formation in soft agar, increased growth rate, and/or 
tumor formation in athymic nude mice. In vivo functional demonstrations involve introducing the 
normal BCRA1 gene into a breast carcinoma cell line mutant at BRCAl and injecting these 
BRCAl + cells into nude mice. Changes observed in tumorigenic growth compared to nude mice 
injected with BRCAl mutant breast carcinoma cells are readily observed. For example, correcting 
the mutant gene decreases the ability of the breast carcinoma cells to form tumors in nude mice 
(Weissman et al. 1987; Wang et al. 1993). 

All publications and patent applications cited in this specification are herein incorporated 
by reference as if each individual publication or patent application were specifically and 
individuaUy indicated to be incorporated by reference. Although the foregoing invention has been 
described in some detail by way of illustration and example for purposes of clarity of 
understanding, it will be readily apparent to those of ordinary skill in the art in light of the 
teachings of this invention that certain changes and modifications may be made thereto without 
departing from the spirit or scope of the appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: KING, Mary-Claire 
FRIEDMAN, Lori 
5 OSTERMEYER, Beth 

ROWELL, Sarah 
LYNCH, Eric 
SZABO, Csiila - 
LEE, Ming 

JO (ii) TITLE OF INVENTION: GENETIC MARKERS FOR .BREAST AND OVARIAN 

CANCER 

(iii) NUMBER OF SEQUENCES: 24 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE : Science & Technology Law Group 
15 (B) STREET: 268 Bush Street, Suite 3200 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94104 

20 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 
25 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 
(Viii) ATTORNEY / AGENT INFORMATION: 

30 (A) NAME: OSMAN, Richard A 

(B) REGISTRATION NUMBER: 36,627 

(C) REFERENCE /DOCKET NUMBER: A-59563-3 /RAO 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 343-4341 
35 <B) TELEFAX: (415) 343-4342 

(C) TELEX: 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS : 
40 (A) LENGTH: 5656 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

45 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 
50 CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 
TCTTAGAGTG TCCCATCTGA TTTTGCATGC TGAAACTTCT CAACCAGAAG AAA3GGCCTT 240 

55 

CACAGTGTCC TTTATGTAAG AATGATATAA CCAAAAGGAG CCTACAAGAA AGTACGAGAT 300 
TTAGTCAACT TGTTGAAGAG CTATTGAAAA TCATTTGTGC TTTTCAGCTT GACACAGGTT 360 
60 TGGAGTATGC AAACAGCTAT AATTTTGCAA AAAAGGAAAA TAACTCTCCT GAACATCTAA 420 

AAGATGAAGT TTCTATCATC CAAAGTATGG GCTACAGAAA CCGTGCCAAA AGACTTCTAC 480 
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35 



40 



45 



50 



55 



60 



AGAGTGAACC CGAAAATCCT TCCTTGCAGG AAACCAGTCT CAGTGTCCAA CTCTCTAACC 
TTGGAACTGT GAGAACTCTG AGGACAAAGC AGCGGATACA ACCTCAAAAG ACGTCTGTCT 
ACATTGAATT GGGATCTGAT TCTTCTGAAG ATACCGTTAA TAAGGCAACT TATTGCAGTG 
TGGGAGATCA AGAATTGTTA CAAATCACCC CTCAAGGAAC CAGGGATGAA ATCAGTTTGG 
ATTCTGCAAA AAAGGCTGCT TGTGAATTTT CTGAGACGGA TGTAACAAAT ACTGAACATC 
ATCAACCCAG TAATAATGAT TTGAACACCA CTGAGAAGCG TGCAGCTGAG AGGCATCCAG 
AAAAGTATCA GGGTAGTTCT GTTTCAAACT TGCATGTGGA GCCATGTGGC ACAAATACTC 
ATGCCAGCTC ATTACAGCAT GAGAACAGCA GTTTATTACT CACTAAAGAC AGAATGAATG 
TAGAAAAGGC TGAATTCTGT AATAAAAGCA AACAGCCTGG CTTAGCAAGG AGCCAACATA 
ACAGATGGGC TGGAAGTAAG GAAACATGTA ATGATAGGCG GACTCCCAGC ACAGAAAAAA 
AGGTAGATCT GAATGCTGAT CCCCTGTGTG AGAGAAAAGA ATGGAATAAG CAGAAAC TGC 
CATGCTCAGA GAATCCTAGA GATACTGAAG ATGTTCCTTG GATAACACTA AATAGCAGCA 
TTCAGAAAGT TAATGAGTGG TTTTCCAGAA GTGATGAACT GTTAGGTTCT GATGACTCAC 
ATGATGGGGA GTCTGAATCA AATGCCAAAG TAGCTGATGT ATTGGACGTT CTAAATGAGG 
TAGATGAATA TTCTGGTTCT TCAGAGAAAA TAGACTTACT GGCCAGTGAT CCTCATGAGG 
CTTTAATATG TAAAAGTGAA AGAGTTCACT CCAAATCAGT AGAGAGTAAT ATTGAAGACA 
AAATATTTGG GAAAACCTAT CGGAAGAAGG CAAGCCTCCC CAACTTAAGC CATGTAACTG 
AAAATCTAAT TATAGGAGCA TTTGTTACTG AGCCACAGAT AATACAAGAG CGTCCCCTCA 
CAAATAAATT AAAGCGTAAA AGGAGACCTA CATCAGGCCT TCATCCTGAG GATTTTATCA 
AGAAAGCAGA TTTGGCAGTT CAAAAGACTC CTGAAATGAT AAATCAGGGA ACTAACCAAA 
CGGAGCAGAA TGGTCAAGTG ATGAATATTA CTAATAGTGG TCATGAGAAT AAAACAAAAG 
GTGATTCTAT TCAGAATGAG AAAAATCCTA ACCCAATAGA ATCACTCGAA AAAGAATCTG 
CTTTCAAAAC GAAAGCTGAA CCTATAAGCA GCAGTATAAG CAATATGGAA CTCGAATTAA 
ATATCCACAA TTCAAAAGCA CCTAAAAAGA ATAGGCTGAG GAGGAAGTCT TCTACCAGGC 
ATATTCATGC GCTTGAACTA GTAGTCAGTA GAAATCTAAG CCCACCTAAT TGTACTGAAT 
TGCAAATTGA TAGTTGTTCT AGCAGTGAAG AGATAAAGAA AAAAAAGTAC AACCAAATGC 
CAGTCAGGCA CAGCAGAAAC CTACAACTCA TGGAAGGTAA AGAACCTGCA ACTGGAGCCA 
AGAAGAGTAA CAAGCCAAAT GAACAGACAA GTAAAAGACA TGACAGCGAT ACTTTCCCAG 
AGCTGAAGTT AACAAATGCA CCTGGTTCTT TTACTAAGTG TTCAAATACC AGTGAACTTA 
AAGAATTTGT CAATCCTAGC CTTCCAAGAG AAGAAAAAGA AGAGAAACTA GAAACAGTTA 
AAGTGTCTAA TAATGCTGAA GACCCCAAAG ATCTCATGTT AAGTGGAGAA AGGGTTTTGC 
AAACTGAAAG ATCTGTAGAG AGTAGCAGTA TTTCATTGGT ACCTGGTACT GATTATGGCA 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1B00 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
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CTCAGGAAAG 
ATAAATGTGT 
CCAAAGATAA 
ACAGTCGGGA 
ATACATTCAA 
AAGAGGAATG 
TCACTTTTGA 
CTGTACAGAC 
TTGATAATGC 
GAGGCAACGA 
GTATACCACC 
TAGAGGAAAA 
TTCCAAGTAC 
CCAGCTCAAG 
ATGAAATAGG 
AATTGAATGC 
CTGGAAGTAA 
CTGTTAATAC 
GTAGTCATGC 
TAAAGGAAGA 
AAAGCGTCCA 
CTCAGGGTTA 
AGGATGAAGA 
CTCAGTCTAC 
ATTTATTATC 
CATCTCAGGA 
AGTGCAGTGA 
GTTCTTCCAA 
AATTGGTTTC 
AAAGCATGGA 
CTGAAGACTG 
CCATGCAACA 



TATCTCGTTA 
GAGTCAGTGT 
TAGAAATGAC 
AACAAGCATA 
GGTTTCAAAG 
TGCAACATTC 
ATGTGAACAA 
AGTTAATATC 
CAAATGTAGT 
AACTGGACTC 
ACTTTTTCCC 
CTTTGAGGAA 
AGTGAGCACA 
CAATATTAAT 
TTCCAGTGAT 
TATGCTTAGA 
TTGTAAGCAT 
AGATTTCTCT 
ATCTCAGGTT 
TACTAGTTTT 
GAAAGGAGAG 
CCGAAGAGGG 
GCTTCCCTGC 
TAGGCATAGC 
ATTGAAGAAT 
ACATCACCTT 
ATTGGAAGAC 
ACAAATGAGG 
AGATGATGAA 
TTCAAACTTA 
CTCAGGGCTA 
TAACCTGATA 



CTGGAAGTTA 
GCAGCATTTG 
ACAGAAGGCT 
GAAATGGAAG 
CGCCAGTCAT 
TCTGCCCACT 
AAGGAAGAAA 
ACTGCAGGCT 
ATCAAAGGAG 
ATTACTCCAA 
ATCAAGTCAT 
CATTCAATGT 
ATTAGCCGTA 
GAAGTAGGTT 
GAAAACATTC 
TTAGGGGTTT 
CCTGAAATAA 
CCATATCTGA 
TGTTCTGAGA 
GCTGAAAATG 
CTTAGCAGGA 
GCCAAGAAAT 
TTCCAACACT 
ACCGTTGCTA 
AGCTTAAATG 
AGTGAGGAAA 
TTGACTGCAA 
CATCAGTCTG 
GAAAGAGGAA 
GGTGAAGCAG 
TCCTCTCAGA 
AAGCTCCAGC 



GCACTCTAGG 
AAAACCCCAA 
TTAAGTATCC 
AAAGTGAACT 
TTGCTCCGTT 
CTGGGTCCTT 
ATCAAGGAAA 
TTCCTGTGGT 
GCTCTAGGTT 
ATAAACATGG 
TTGTTAAAAC 
CACCTGAAAG 
ATAACATTAG 
CCAGTACTAA 
AAGCAGAACT 
TGCAACCTGA 
AAAAGCAAGA 
TTTCAGATAA 
CACCTGATGA 
ACATTAAGGA 
GTCCTAGCCC 
TAGAGTCCTC 
TGTTATTTGG 
CCGAGTGTCT 
ACTGCAGTAA 
CAAAATGTTC 
ATACAAACAC 
AAAGCCAGGG 
CGGGCTTGGA 
CATCTGGGTG 
GTGACATTTT 
AGGAAATGGC 



GAAGGCAAAA 
GGGACTAATT 
ATTGGGACAT 
TGATGCTCAG 
TTCAAATCCA 
AAAGAAACAA 
GAATGAGTCT 
TGGTCAGAAA 
TTGTCTATCA 
ACTTTTACAA 
TAAATGTAAG 
AGAAATGGGA 
AGAAAATGTT 
TGAAGTGGGC 
AGGTAGAAAC 
GGTCTATAAA 
ATATGAAGAA 
CTTAGAACAG 
CCTGTTAGAT 
AAGTTCTGCT 
TTTCACCCAT 
AGAAGAGAAC 
TAAAGTAAAC 
GTCTAAGAAC 
CCAGGTAATA 
TGCTAGCTTG 
CCAGGATCCT 
AGTTGGTCTG 
AGAAAATAAT 
TGAGAGTGAA 
AACCACTCAG 
TGAACTAGAA 



ACAGAACCAA 

CATGGTTGTT 

GAAGTTAACC 

TATTTGCAGA 

GGAAATGCAG 

AGTCCAAAAG 

AATATCAAGC 

GATAAGCCAG 

TCTCAGTTCA 

AACCCATATC 

AAAAATCTGC 

AATGAGAACA 

TTTAAAGAAG 

TCCAGTATTA 

AGAGGGCCAA 

CAAAGTCTTC 

GTAGTTCAGA 

CCTATGGGAA 

GATGGTGAAA 
GTTTTTAGCA 

ACACATTTGG 
TTATCTAGTG 
AATATACCTT 
ACAGAGGAGA 
TTGGCAAAGG 
TTTTCTTCAC 
TTCTTGATTG 
AGTGACAAGG 
CAAGAAGAGC 
ACAAGCGTCT 
CAGAGGGATA 
GCTGTGTTAG 



2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 
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AACAGCATGG GAGCCAGCCT TCTAACAGCT ACCCTTCCAT CATAAGTGAC TCTTCTGCCC 4380 

TTGAGGACCT GCGAAATCCA GAACAAAGCA CATCAGAAAA AGCAGTATTA ACTTCACAGA 4440 

AAAGTAGTGA ATACCCTATA AGCCAGAATC CAGAAGGCCT TTCTGCTGAC AAGTTTGAGG 4500 

TGTCTGCAGA TAGTTCTACC AGTAAAAATA AAGAACCAGG AGTGGAAAGG TCATCCCCTT 4560 

CTAAATGCCC ATCATTAGAT GATAGGTGGT ACATGCACAG TTGCTCTGGG AGTCTTCAGA 4620 

ATAGAAACTA CCCATCTCAA GAGGAGCTCA TTAAGGTTGT TGATGTGGAG GAGCAACAGC 4680 
TGGAAGAGTC TGGGCCACAC GATTTGACGG AAACATCTTA CTTGCCAAGG CAAGATCTAG • 4740 

AGGGAACCCC TTACCTGGAA TCTGGAATCA GCCTCTTCTC TGATGACCCT GAATCTGATC 4800 

CTTCTGAAGA CAGAGCCCCA GAGTCAGCTC GTGTTGGCAA CATACCATCT TCAACCTCTG 4860 

CATTGAAAGT TCCCCAATTG AAAGTTGCAG AATCTGCCCA GAGTCCAGCT GCTGCTCATA 4920 

CTACTGATAC TGCTGGGTAT AATGCAATGG AAGAAAGTGT GAGCAGGGAG AAGCCAGAAT 4980 

TGACAGCTTC AACAGAAAGG GTCAACAAAA GAATGTCCAT GGTGGTGTCT GGCCTGACCC 5040 

25 CAGAAGAATT TATGCTCGTG TACAAGTTTG CCAGAAAACA CCACATCACT TTAACTAATC 5100 

TAATTACTGA AGAGACTACT CATGTTGTTA TGAAAACAGA TGCTGAGTTT GTGTGTGAAC 5160 

GGACACTGAA ATATTTTCTA GGAATTGCGG GAGGAAAATG GGTAGTTAGC TATTTCTGGG 5220 

TGACCCAGTC TATTAAAGAA AGAAAAATGC TGAATGAGCA TGATTTTGAA GTCAGAGGAG 5280 

ATGTGGTCAA TGGAAGAAAC CACCAAGGTC CAAAGCGAGC AAGAGAATCC CAGGACAGAA 5340 

35 AGATCTTCAG GGGGCTAGAA ATCTGTTGCT ATGGGCCCTT CACCAACATG CCCACAGATC 5400 

AACTGGAATG GATGGTACAG CTGTGTGGTG CTTCTGTGGT GAAGGAGCTT TCATCATTCA 5460 

CCCTTGGCAC AGGTGTCCAC CCAATTGTGG TTGTGCAGCC AGATGCCTGG ACAGAGGACA 5520 

ATGGCTTCCA TGCAATTGGG CAGATGTGTG AGGCACCTGT GGTGACCCGA GAGTGGGTGT 5580 

TGGACAGTGT AGCACTCTAC CAGTGCCAGG AGCTGGACAC CTACCTGATA CCCCAGATCC 5640 

45 CCCACAGCCA CTACTG 5656 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
50 (A) LENGTH: 5709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

55 (ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 
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TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA GTGTGACCAC 240 

ATATTTTGCA AATTTTGCAT GCTGAAACTT CTCAACCAGA AGAAAGGGCC TTCACAGTGT 300 

CCTTTATGTA AGAATGATAT AACCAAAAGG AGCCTACAAG AAAGTACGAG ATTTAGTCAA 360 

CTTGTTGAAG AGCTATTGAA AATCATTTGT GCTTTTCAGC TTGACACAGG TTTGGAGTAT 420 

GCAAACAGCT ATAATTTTGC AAAAAAGGAA AATAACTCTC CTGAACATCT AAAAGATGAA 480 

GTTTCTATCA TCCAAAGTAT GGGCTACAGA AACCGTGCCA AAAGACTTCT ACAGAGTGAA 540 

CCCGAAAATC CTTCCTTGCA GGAAACCAGT CTCAGTGTCC AACTCTCTAA CCTTGGAACT 600 

GTGAGAACTC TGAGGACAAA GCAGCGGATA CAACCTCAAA AGACGTCTGT CTACATTGAA 660 

TTGGGATCTG ATTCTTCTGA AGATACCGTT AATAAGGCAA CTTATTGCAG TGTGGGAGAT 720 

CAAGAATTGT TACAAATCAC CCCTCAAGGA ACCAGGGATG AAATCAGTTT GGATTCTGCA 780 

AAAAAGGCTG CTTGTGAATT TTCTGAGACG GATGTAACAA ATACTGAACA TCATCAACCC 840 

AGTAATAATG ATTTGAACAC CACTGAGAAG CGTGCAGCTG AGAGGCATCC AGAAAAGTAT 900 

CAGGGTAGTT CTGTTTCAAA CTTGCATGTG GAGCCATGTG GCACAAATAC TCATGCCAGC 960 

TCATTACAGC ATGAGAACAG CAGTTTATTA CTCACTAAAG ACAGAATGAA TGTAGAAAAG 1020 

GCTGAATTCT GTAATAAAAG CAAACAGCCT GGCTTAGCAA GGAGCCAACA TAACAGATGG 1080 

GCTGGAAGTA AGGAAACATG TAATGATAGG CGGACTCCCA GCACAGAAAA AAAGGTAGAT 1140 

CTGAATGCTG ATCCCCTGTG TGAGAGAAAA GAATGGAATA AGCAGAAACT GCCATGCTCA 1200 

GAGAATCCTA GAGATACTGA AGATGTTCCT TGGATAACAC TAAATAGCAG CATTCAGAAA 1260 

GTTAATGAGT GGTTTTCCAG AAGTGATGAA CTGTTAGGTT CTGATGACTC ACATGATGGG 1320 

GAGTCTGAAT CAAATGCCAA AGTAGCTGAT GTATTGGACG TTCTAAATGA GGTAGATGAA 1380 

TATTCTGGTT CTTCAGAGAA AATAGACTTA CTGGCCAGTG ATCCTCATGA GGCTTTAATA 1440 

TGTAAAAGTG AAAGAGTTCA CTCCAAATCA GTAGAGAGTA ATATTGAAGA CAAAATATTT 1500 

GGGAAAACCT ATCGGAAGAA GGCAAGCCTC CCCAACTTAA GCCATGTAAC TGAAAATCTA 1560 

ATTATAGGAG CATTTGTTAC TGAGCCACAG ATAATACAAG AGCGTCCCCT CACAAATAAA 1620 

TTAAAGCGTA AAAGGAGACC TACATCAGGC CTTCATCCTG AGGATTTTAT CAAGAAAGCA 1680 

GATTTGGCAG TTCAAAAGAC TCCTGAAATG ATAAATCAGG GAACTAACCA AACGGAGCAG 1740 

AATGGTCAAG TGATGAATAT TACTAATAGT GGTCATGAGA ATAAAACAAA AGGTGATTCT 1800 

ATTCAGAATG AGAAAAATCC TAACCCAATA GAATCACTCG AAAAAGAATC TGCTTTCAAA 186 0 

ACGAAAGCTG AACCTATAAG CAGCAGTATA AGCAATATGG AACTCGAATT AAATATCCAC 1920 

AATTCAAAAG CACCTAAAAA GAATAGGCTG AGGAGGAAGT CTTCTACCAG GCATATTCAT 1980 

GCGCTTGAAC TAGTAGTCAG TAGAAATCTA AGCCCACCTA ATTGTACTGA ATTGCAAATT 2040 

GATAGTTGTT CTAGCAGTGA AGAGATAAAG AAAAAAAAGT ACAACCAAAT GCCAGTCAGG 2100 
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CACAGCAGAA ACCTACAACT CATGGAAGGT 
AACAAGCCAA ATGAACAGAC AAGTAAAAGA 
5 TTAACAAATG CACCTGGTTC TTTTACTAAG 

GTCAATCCTA GCCTTCCAAG AGAAGAAAAA 
AATAATGCTG AAGACCCCAA AGATCTCATG 

10 

AGATCTGTAG AGAGTAGCAG TATTTCATTG 
AGTATCTCGT TACTGGAAGT TAGCACTCTA 
15 GTGAGTCAGT GTGCAGCATT TGAAAACCCC 

AATAGAAATG ACACAGAAGG CTTTAAGTAT 
GAAACAAGCA TAGAAATGGA AGAAAGTGAA 

20 

AAGGTTTCAA AGCGCCAGTC ATTTGCTCCG 
TGTGCAACAT TCTCTGCCCA CTCTGGGTCC 
25 GAATGTGAAC AAAAGGAAGA AAATCAAGGA 

ACAGTTAATA TCACTGCAGG CTTTCCTGTG 
GCCAAATGTA GTATCAAAGG AGGCTCTAGG 

30 

GAAACTGGAC TCATTACTCC AAATAAACAT 
CCACTTTTTC CCATCAAGTC ATTTGTTAAA 
35 AACTTTGAGG AACATTCAAT GTCACCTGAA 

ACAGTGAGCA CAATTAGCCG TAATAACATT 
AGCAATATTA ATGAAGTAGG TTCCAGTACT 

40 

GGTTCCAGTG ATGAAAACAT TCAAGCAGAA 
GCTATGCTTA GATTAGGGGT TTTGCAACCT 
45 AATTGTAAGC ATCCTGAAAT AAAAAAGCAA 

ACAGATTTCT CTCCATATCT GATTTCAGAT 
GCATCTCAGG TTTGTTCTGA GACACCTGAT 

50 

GATACTAGTT TTGCTGAAAA TGACATTAAG 
CAGAAAGGAG AGCTTAGCAG GAGTCCTAGC 
55 TACCGAAGAG GGGCCAAGAA ATTAGAGTCC 

GAGCTTCCCT GCTTCCAACA CTTGTTATTT 
ACTAGGCATA GCACCGTTGC TACCGAGTGT 

60 

TCATTGAAGA ATAGCTTAAA TGACTGCAGT 
GAACATCACC TTAGTGAGGA AACAAAATGT 



PCT/US96/05621 

AAAGAACCTG CAACTGGAGC CAAGAAGAGT 2160 

CATGACAGCG ATACTTTCCC AGAGCTGAAG 2220 

TGTTCAAATA CCAGTGAACT TAAAGAATTT 2280 

GAAGAGAAAC TAGAAACAGT TAAAGTGTCT 2340 

TTAAGTGGAG AAAGGGTTTT GCAAACTGAA 2400 

GTACCTGGTA CTGATTATGG CACTCAGGAA 2460 

GGGAAGGCAA AAACAGAACC AAATAAATGT 2520 

AAGGGACTAA TTCATGGTTG TTCCAAAGAT 2580 

CCATTGGGAC ATGAAGTTAA CCACAGTCGG 2640 

CTTGATGCTC AGTATTTGCA GAATACATTC 2700 

TTTTCAAATC CAGGAAATGC AGAAGAGGAA 27 60 

TTAAAGAAAC AAAGTCCAAA AGTCACTTTT 2 820 

AAGAATGAGT CTAATATCAA GCCTGTACAG 2 880 

GTTGGTCAGA AAGATAAGCC AGTTGATAAT 2940 

TTTTGTCTAT CATCTCAGTT CAGAGGCAAC 3 000 

GGACTTTTAC AAAACCCATA TCGTATACCA 3 060 

ACTAAATGTA AGAAAAATCT GCTAGAGGAA 3120 

AGAGAAATGG GAAATGAGAA CATTCCAAGT 3180 

AGAGAAAATG TTTTTAAAGA AGCCAGCTCA 3240 

AATGAAGTGG GCTCCAGTAT TAATGAAATA 3300 

CTAGGTAGAA ACAGAGGGCC AAAATTGAAT 3360 

GAGGTCTATA AACAAAGTCT TCCTGGAAGT 3420 

GAATATGAAG AAGTAGTTCA GACTGTTAAT 3480 

AACTTAGAAC AGCCTATGGG AAGTAGTCAT 3540 

GACCTGTTAG ATGATGGTGA AATAAAGGAA 3600 

GAAAGTTCTG CTGTTTTTAG CAAAAGCGTC 3660 

CCTTTCACCC ATACACATTT GGCTCAGGGT 3720 

TCAGAAGAGA ACTTATCTAG TGAGGATGAA 3780 

GGTAAAGTAA ACAATATACC TTCTCAGTCT 3840 

CTGTCTAAGA ACACAGAGGA GAATTTATTA 3900 

AACCAGGTAA TATTGGCAAA GGCATCTCAG 39 60 

TCTGCTAGCT TGTTTTCTTC ACAGTGCAGT 4020 

20 
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GAATTGGAAG 


ACTTGACTGC 


AAATACAAAC 


ACCCAGGATC 


CTTTCTTGAT 


TGGTTCTTCC 


4080 


AAACAAATGA 


GGCATCAGTC 


TGAAAGCCAG 


GGAGTTGGTC 


TGAGTGACAA 


GGAATTGGTT 


4140 


TCAGATGATG 


AAGAAAGAGG 


AACGGGCTTG 


GAAGAAAATA 


ATCAAGAAGA 


GCAAAGCATG 


4200 


GATTCAAACT 


TAGGTGAAGC 


AGCATCTGGG 


TGTGAGAGTG 


AAACAAGCGT 


CTCTGAAGAC 


4260 


TGCTCAGGGC 


TATCCTCTCA 


GAGTGACATT 


TTAACCACTC 


AGCAGAGGGA 


TACCATGCAA 


4320 


CATAACCTGA 


TAAAGCTCCA 


GCAGGAAATG 


GCTGAACTAG 


AAGCTGTGTT 


AGAACAGCAT 


4380 


GGGAGCCAGC 


CTTCTAACAG 


CTACCCTTCC 


ATCATAAGTG 


ACTCTTCTGC 


CCTTGAGGAC 


4440 


CTGCGAAATC 


CAGAACAAAG 


CACATCAGAA 


AAAGCAGTAT 


TAACTTCACA 


GAAAAGTAGT 


4500 


GAATACCCTA 


TAAGCCAGAA 


TCCAGAAGGC 


CTTTCTGCTG 


ACAAGTTTGA 


GGTGTCTGCA 


4560 


GATAGTTCTA 


CCAGTAAAAA 


TAAAGAACCA 


GGAGTGGAAA 


GGTCATCCCC 


TTCTAAATGC 


4620 


CCATCATTAG 


ATGATAGGTG 


GTACATGCAC 


AGTTGCTCTG 


GGAGTCTTCA 


GAATAGAAAC 


4680 


TACCCATCTC 


AAGAGGAGCT 


CATTAAGGTT 


GTTGATGTGG 


AGGAGCAACA 


GCTGGAAGAG 


4740 


TCTGGGCCAC 


ACGATTTGAC 


GGAAACATCT 


TACTTGCCAA 


GGCAAGATCT 


AGAGGGAACC 


4800 


CCTTACCTGG 


AATCTGGAAT 


CAGCCTCTTC 


TCTGATGACC 


CTGAATCTGA 


TCCTTCTGAA 


4860 


GACAGAGCCC 


CAGAGTCAGC 


TCGTGTTGGC 


AACATACCAT 


CTTCAACCTC 


TGCATTGAAA 


4920 


GTTCCCCAAT 


TGAAAGTTGC 


AGAATCTGCC 


CAGAGTCCAG 


CTGCTGCTCA 


TACTACTGAT 


4980 


ACTGCTGGGT 


ATAATGCAAT 


GGAAGAAAGT 


GTGAGCAGGG AGAAGCCAGA ATTGACAGCT 


5040 


TCAACAGAAA 


GGGTCAACAA 


AAGAATGTCC 


ATGGTGGTGT 


CTGGCCTGAC 


CCCAGAAGAA 


5100 


TTTATGCTCG 


TGTACAAGTT 


TGCCAGAAAA 


CACCACATCA CTTTAACTAA 


TCTAATTACT 


5160 


GAAGAGACTA 


CTCATGTTGT 


TATGAAAACA 


GATGCTGAGT 


TTGTGTGTGA 


ACGGACACTG 


5220 


AAATATTTTC 


TAGGAATTGC 


GGGAGGAAAA 


TGGGTAGTTA GCTATTTCTG 


GGTGACCCAG 


5280 


TCTATTAAAG 


AAAGAAAAAT 


GCTGAATGAG 


CATGATTTTG AAGTCAGAGG AGATGTGGTC 


5340 


AATGGAAGAA 


ACCACCAAGG 


TCCAAAGCGA 


GCAAGAGAAT 


CCCAGGACAG 


AAAGATCTTC 


5400 


AGGGGGC TAG 


AAATCTGTTG 


CTATGGGCCC 


TTCACCAACA TGCCCACAGA TCAACTGGAA 


5460 


TGGATGGTAC 


AGCTGTGTGG 


TGCTTCTGTG 


GTGAAGGAGC 


TTTCATCATT 


CACCCTTGGC 


5520 


ACAGGTGTCC 


ACCCAATTGT 


GGTTGTGCAG 


CCAGATGCCT 


GGACAGAGGA 


CAATGGCTTC 


5580 


CATGCAATTG 


GGCAGATGTG 


TGAGGCACCT 


GTGGTGACCC 


GAGAGTGGGT 


GTTGGACAGT 


5640 


GTAGCACTCT 


ACCAGTGCCA 


GGAGCTGGAC 


ACCTACCTGA TACCCCAGAT 


CCCCCACAGC 


5700 


CACTACTGA 












5709 


(2) INFORMATION FOR SEQ ID NO: 3; 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5689 base pairs 
<B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY; linear 






5 


(ii) MOLECULE TYPE: cDNA 






(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 








AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA 


TAACTGGGCC 


60 


10 


CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA 


CAGAAAGAAA 


120 




TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 


180 


15 


TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA 


AAGTGTGACC 


240 


ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG 


CCTTCACAGT 


300 




GTCCTTTATG AGCCTACAAG AAAGTACGAG ATTTAGTCAA CTTGTTGAAG 


AGCTATTGAA 


360 


20 


AATCATTTGT GCTTTTCAGC TTGACACAGG TTTGGAGTAT GCAAACAGCT 


ATAATTTTGC 


420 




AAAAAAGGAA AATAACTCTC CTGAACATCT AAAAGATGAA GTTTCTATCA 


TCCAAAGTAT 


480 


25 


GGGCTACAGA AACCGTGCCA AAAGACTTCT ACAGAGTGAA CCCGAAAATC 


CTTCCTTGCA 


540 


GGAAACCAGT CTCAGTGTCC AACTCTCTAA CCTTGGAACT GTGAGAACTC 


TGAGGACAAA 


600 




GCAGCGGATA CAACCTCAAA AGACGTCTGT CTACATTGAA TTGGGATCTG 


ATTCTTCTGA 


660 


30 


AG AT AC CGTT AATAAGGCAA CTTATTGCAG TGTGGGAGAT CAAGAATTGT 


TACAAATCAC 


720 




CCCTCAAGGA ACCAGGGATG AAATCAGTTT GGATTCTGCA AAAAAGGCTG 


CTTGTGAATT 


780 


35 


TTCTGAGACG GATGTAACAA ATACTGAACA TCATCAACCC AGTAATAATG ATTTGAACAC 


840 


CACTGAGAAG CGTGCAGCTG AGAGGCATCC AGAAAAGTAT CAGGGTAGTT 


CTGTTTCAAA 


900 




CTTGCATGTG GAGCCATGTG GCACAAATAC TCATGCCAGC TCATTACAGC ATGAGAACAG 


960 


40 


CAGTTTATTA CTCACTAAAG ACAGAATGAA TGTAGAAAAG GCTGAATTCT 


GTAATAAAAG 


1020 




CAAACAGCCT GGCTTAGCAA GGAGCCAACA TAACAGATGG GCTGGAAGTA 


AGGAAACATG 


1080 


45 


TAATGATAGG CGGACTCCCA GCACAGAAAA AAAGGTAGAT CTGAATGCTG 


ATCCCCTGTG 


1140 


TGAGAGAAAA GAATGGAATA AGCAGAAACT GCCATGCTCA GAGAATCCTA GAGATACTGA 


1200 




AGATGTTCCT TGGATAACAC TAAATAGCAG CATTCAGAAA GTTAATGAGT GGTTTTCCAG 


1260 


50 


AAGTGATGAA CTGTTAGGTT CTGATGACTC ACATGATGGG GAGTCTGAAT 


CAAATGCCAA 


1320 




AGTAGCTGAT GTATTGGACG TTCTAAATGA GGTAGATGAA TATTCTGGTT 


CTTCAGAGAA 


1380 


55 


AATAGACTTA CTGGCCAGTG ATCCTCATGA GGCTTTAATA TGTAAAAGTG 


AAAGAGTTCA 


1 A A f\ 

144U 


CTCCAAATCA GTAGAGAGTA ATATTGAAGA CAAAATATTT GGGAAAACCT 


ATCGGAAGAA 


1500 




GGCAAGCCTC CCCAACTTAA GCCATGTAAC TGAAAATCTA ATTATAGGAG 


CATTTGTTAC 


1560 


60 


TGAGCCACAG ATAATACAAG AGCGTCCCCT CACAAATAAA TTAAAGCGTA 


AAAGGAGACC 


1620 




TACATCAGGC CTTCATCCTG AGGATTTTAT CAAGAAAGCA GATTTGGCAG 


TTCAAAAGAC 


1680 
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TCCTGAAATG 


ATAAATCAGG 


GAACTAACCA AACGGAGCAG AATGGTCAAG 


TGATGAATAT 


1740 




TACTAATAGT 


GGTCATGAGA 


ATAAAACAAA AGGTGATTCT ATTCAGAATG AGAAAAATCC 


1800 


5 


TAACCCAATA 


GAATCACTCG 


AAAAAGAATC TGCTTTCAAA ACGAAAGCTG 


AACCTATAAG 


1860 




CAGCAGTATA 


AGCAATATGG 


AACTCGAATT AAATATCCAC AATTCAAAAG 


CACCTAAAAA 


1920 


10 


GAATAGGCTG 


AGGAGGAAGT 


CTTCTACCAG GCATATTCAT GCGCTTGAAC 


TAGTAGTCAG 


1980 


TAGAAATCTA 


AGCCCACCTA 


ATTGTACTGA ATTGCAAATT GATAGTTGTT 


CTAGCAGTGA 


2040 




AGAGATAAAG 


AAAAAAAAGT 


ACAACCAAAT GCCAGTCAGG CACAGCAGAA 


ACCTACAACT 


2100 


15 


CATGGAAGGT 


AAAGAACCTG 


CAACTGGAGC CAAGAAGAGT AACAAGCCAA 


ATGAACAGAC 


2160 




AAGTAAAAGA 


CATGACAGCG 


ATACTTTCCC AGAGCTGAAG TTAACAAATG 


CACCTGGTTC 


2220 


20 


TTTTACTAAG 


TGTTCAAATA 


CCAGTGAACT TAAAGAATTT GTCAATCCTA 


GCCTTCCAAG 


2280 


AGAAGAAAAA 


GAAGAGAAAC 


TAGAAACAGT TAAAGTGTCT AATAATGCTG 


AAGACCCCAA 


2340 




AGATCTCATG 


TTAAGTGGAG 


AAAGGGTTTT GCAAACTGAA AGATCTGTAG 


AGAGTAGCAG 


2400 


25 


TATTTCATTG 


GTACCTGGTA 


CTGATTATGG CACTCAGGAA AGTATCTCGT 


TACTGGAAGT 


2460 




TAGCACTCTA 


GGGAAGGCAA 


AAACAGAACC AAATAAATGT GTGAGTCAGT 


GTGCAGCATT 


2520 


30 


TGAAAACCCC 


AAGGGACTAA 


TTCATGGTTG TTCCAAAGAT AATAGAAATG ACACAGAAGG 


2580 


CTTTAAGTAT 


CCATTGGGAC 


ATGAAGTTAA CCACAGTCGG GAAACAAGCA TAGAAATGGA 


2640 




AGAAAGTGAA 


CTTGATGCTC 


AGTATTTGCA GAATACATTC AAGGTTTCAA AGCGCCAGTC 


2700 


35 


ATTTGCTCCG 


TTTTCAAATC 


CAGGAAATGC AGAAGAGGAA TGTGCAACAT 


TCTCTGCCCA 


2760 




CTCTGGGTCC 


TTAAAGAAAC 


AAAGTCCAAA AGTCACTTTT GAATGTGAAC 


AAAAGGAAGA 


2820 


40 


AAATCAAGGA 


AAGAATGAGT 


CTAATATCAA GCCTGTACAG ACAGTTAATA 


TCACTGCAGG 


2880 


CTTTCCTGTG 


GTTGGTCAGA 


AAGATAAGCC AGTTGATAAT GCCAAATGTA 


GTATCAAAGG 


2940 




AGGCTCTAGG 


TTTTGTCTAT 


CATCTCAGTT CAGAGGCAAC GAAACTGGAC 


TCATTACTCC 


3000 


45 


AAATAAACAT 


GGACTTTTAC 


AAAACCCATA TCGTATACCA CCACTCTTTC 


CCATCAAGTC 


3060 




ATTTGTTAAA 


ACTAAATGTA 


AGAAAAATCT GCTAGAGGAA AACTTTGAGG AACATTCAAT 


3120 


50 


GTCACCTGAA 


AGAGAAATGG 


GAAATGAGAA CATTCCAAGT ACAGTGAGCA 


CAATTAGCCG 


3180 


TAATAACATT 


AGAGAAAATG 


TTTTTAAAGA AGCCAGCTCA AGCAATATTA ATGAAGTAGG 


3240 




TTCCAGTACT 


AATGAAGTGG 


GCTCCAGTAT TAATGAAATA GGTTCCAGTG 


ATGAAAACAT 


3300 


<c 






ACAGAGGGCC AAAATTGAAT GCTATGCTTA 


GATTAGGGGT 


3360 




TTTGCAACCT 


GAGGTCTATA 


AACAAAGTCT TCCTGGAAGT AATTGTAAGC 


ATCCTGAAAT 


3420 


60 


AAAAAAGCAA 


GAATATGAAG 


AAGTAGTTCA GACTGTTAAT ACAGATTTCT 


CTCCATATCT 


3480 


GATTTCAGAT 


AACTTAGAAC 


AGCCTATGGG AAGTAGTCAT GCATCTCAGG 


TTTGTTCTGA 


3540 




GACACCTGAT 


GACCTGTTAG 


ATGATGGTGA AATAAAGGAA GATACTAGTT TTGCTGAAAA 


3600 
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TGACATTAAG GAAAGTTCTG CTGTTTTTAG CAAAAGCGTC CAGAAAGGAG AGCTTAGCAG 3660 

GAGTCCTAGC CCTTTCACCC ATACACATTT GGCTCAGGGT TACCGAAGAG GGGCCAAGAA 3720 

ATTAGAGTCC TCAGAAGAGA ACTTATCTAG TGAGGATGAA GAGCTTCCCT GCTTCCAACA 3780 

CTTGTTATTT GGTAAAGTAA ACAATATACC TTCTCAGTCT ACTAGGCATA GCACCGTTGC 3840 

TACCGAGTGT CTGTCTAAGA ACACAGAGGA GAATTTATTA TCATTGAAGA ATAGCTTAAA 3900 

TGACTGCAGT AACCAGGTAA TATTGGCAAA GGCATCTCAG GAACATCACC TTAGTGAGGA 3960 

AACAAAATGT TCTGCTAGCT TGTTTTCTTC ACAGTGCAGT GAATTGGAAG ACTTGACTGC 4020 

AAATACAAAC ACCCAGGATC CTTTCTTGAT TGGTTCTTCC AAACAAATGA GGCATCAGTC 4080 

TGAAAGCCAG GGAGTTGGTC TGAGTGACAA GGAATTGGTT TCAGATGATG AAGAAAGAGG 4140 

AACGGGCTTG GAAGAAAATA ATCAAGAAGA GCAAAGCATG GATTCAAACT TAGGTGAAGC 4200 

AGCATCTGGG TGTGAGAGTG AAACAAGCGT CTCTGAAGAC TGCTCAGGGC TATCCTCTCA 4260 

GAGTGACATT TTAACCACTC AGCAGAGGGA TACCATGCAA CATAACCTGA TAAAGCTCCA 432 0 

GCAGGAAATG GCTGAACTAG AAGCTGTGTT AGAACAGCAT GGGAGCCAGC CTTCTAACAG 4380 

CTACCCTTCC ATCATAAGTG ACTCTTCTGC CCTTGAGGAC CTGCGAAATC CAGAACAAAG 4440 

CACATCAGAA AAAGCAGTAT TAACTTCACA GAAAAGTAGT GAATACCCTA TAAGCCAGAA 4500 

TCCAGAAGGC CTTTCTGCTG ACAAGTTTGA GGTGTCTGCA GATAGTTCTA CCAGTAAAAA 4560 

TAAAGAACCA GGAGTGGAAA GGTCATCCCC TTCTAAATGC CCATCATTAG ATGATAGGTG 4620 

GTACATGCAC AGTTGCTCTG GGAGTCTTCA GAATAGAAAC TACCCATCTC AAGAGGAGCT 4680 

CATTAAGGTT GTTGATGTGG AGGAGCAACA GCTGGAAGAG TCTGGGCCAC ACGATTTGAC 4740 

GGAAACATCT TACTTGCCAA GGCAAGATCT AGAGGGAACC CCTTACCTGG AATCTGGAAT 4800 

CAGCCTCTTC TCTGATGACC CTGAATCTGA TCCTTCTGAA GACAGAGCCC CAGAGTCAGC 4860 

TCGTGTTGGC AACATACCAT CTTCAACCTC TGCATTGAAA GTTCCCCAAT TGAAAGTTGC 4920 

AGAATCTGCC CAGAGTCCAG CTGCTGCTCA TACTACTGAT ACTGCTGGGT ATAATGCAAT 4980 

GGAAGAAAGT GTGAGCAGGG AGAAGCCAGA ATTGACAGCT TCAACAGAAA GGGTCAACAA 5040 

AAGAATGTCC ATGGTGGTGT CTGGCCTGAC CCCAGAAGAA TTTATGCTCG TGTACAAGTT 5100 

TGCCAGAAAA CACCACATCA CTTTAACTAA TCTAATTACT GAAGAGACTA CTCATGTTGT 5160 
TATGAAAACA GATGCTGAGT TTGTGTGTGA ACGGACACTG AAATATTTTC TAGGAATTGC 5220 
GGGAGGAAAA TGGGTAGTTA GCTATTTCTG GGTGACCCAG TCTATTAAAG AAAGAAAAAT 5280 
GCTGAATGAG CATGATTTTG AAGTCAGAGG AGATGTGGTC AATGGAAGAA ACCACCAAGG 5340 
TCCAAAGCGA GCAAGAGAAT CCCAGGACAG AAAGATCTTC AGGGGGCTAG AAATCTGTTG 5400 
CTATGGGCCC TTCACCAACA TGCCCACAGA TCAACTGGAA TGGATGGTAC AGCTGTGTGG 5460 
TGCTTCTGTG GTGAAGGAGC TTTCATCATT CACCCTTGGC ACAGGTGTCC ACCCAATTGT 5520 
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GGTTGTGCAG CCAGATGCCT GGACAGAGGA CAATGGCTTC CATGCAATTG GGCAGATGTG 5580 

TGAGGCACCT GTGGTGACCC GAGAGTGGGT GTTGGACAGT GTAGCACTCT ACCAGTGCCA 5640 

GGAGCTGGAC ACCTACCTGA TACCCCAGAT CCCCCACAGC CACTACTGA 5689 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGG 300 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 420 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 1080 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 1140 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 1200 

CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 1260 

AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 1320 

GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 1380 
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AATATTCTGG TTCTTCAGAG AAAATAGACT 
TATGTAAAAG TGAAAGAGTT CACTCCAAAT 
5 TTGGGAAAAC CTATCGGAAG AAGGCAAGCC 

TAATTATAGG AGCATTTGTT ACTGAGCCAC 
AATTAAAGCG TAAAAGGAGA CCTACATCAG 

10 

CAGATTTGGC AGTTCAAAAG ACTCCTGAAA 
AGAATGGTCA AGTGATGAAT ATTACTAATA 
15 CTATTCAGAA TGAGAAAAAT CCTAACCCAA 

AAACGAAAGC TGAACCTATA AGCAGCAGTA 
ACAATTCAAA AGCACCTAAA AAGAATAGGC 

20 

ATGCGCTTGA ACTAGTAGTC AGTAGAAATC 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA 
25 GGCACAGCAG AAACCTACAA CTCATGGAAG 

GTAACAAGCC AAATGAACAG ACAAGTAAAA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA 

30 

TTGTCAATCC TAGCCTTCCA AGAGAAGAAA 
CTAATAATGC TGAAGACCCC AAAGATCTCA 
35 AAAGATCTGT AGAGAGTAGC AGTATTTCAT 

AAAGTATCTC GTTACTGGAA GTTAGCACTC 
GTGTGAGTCA GTGTGCAGCA TTTGAAAACC 

40 

ATAATAGAAA TGACACAGAA GGCTTTAAGT 
GGGAAACAAG CATAGAAATG GAAGAAAGTG 
45 TCAAGGTTTC AAAGCGCCAG TCATTTGCTC 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT 
TTGAATGTGA ACAAAAGGAA GAAAATCAAG 

50 

AGACAGTTAA TATCACTGCA GGCTTTCCTG 
ATGCCAAATG TAGTATCAAA GGAGGCTCTA 
55 ACGAAACTGG ACTCATTACT CCAAATAAAC 

CACCACTTTT TCCCATCAAG TCATTTGTTA 
AAAACTTTGA GGAACATTCA ATGTCACCTG 

60 

GTACAGTGAG CACAATTAGC CGTAATAACA 
CAAGCAATAT TAATGAAGTA GGTTCCAGTA 
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TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 

CAGTAGAGAG TAATATTGAA GACAAAATAT 1500 

TCCCCAACTT AAGCCATGTA ACTGAAAATC 1560 

AGATAATACA AGAGCGTCCC CTCACAAATA 1620 

GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

TGATAAATCA GGGAACTAAC CAAACGGAGC 1740 

GTGGTCATGA GAATAAAACA AAAGGTGATT .1800 

TAGAATCACT CGAAAAAGAA TCTGCTTTCA 1860 

TAAGCAATAT GGAACTCGAA TTAAATATCC 1920 

TGAGGAGGAA GTCTTCTACC AGGCATATTC 1980 

TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

GACATGACAG CGATACTTTC CCAGAGCTGA 222 0 

AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 22 80 

AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 2340 

TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 2400 

TGGTACCTGG TACTGATTAT GGCACTCAGG 2460 

TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2520 

CCAAGGGACT AATTCATGGT TGTTCCAAAG 2580 

ATCCATTGGG ACATGAAGTT AACCACAGTC 2640 

AACTTGATGC TCAGTATTTG CAGAATACAT 2700 

CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 2760 

CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 2820 

GAAAGAATGA GTCTAATATC AAGCCTGTAC 2880 

TGGTTGGTCA GAAAGATAAG CCAGTTGATA 294 0 

GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3000 

ATGGACTTTT ACAAAACCCA TATCGTATAC 3060 

AAACTAAATG TAAGAAAAAT CTGCTAGAGG 312 0 

AAAGAGAAAT GGGAAATGAG AACATTCCAA 3180 

TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 324 0 

CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3300 

26 
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TAGGTTCCAG TGATGAAAAC ATTCAAGCAG 
ATGCTATGCT TAGATTAGGG GTTTTGCAAC 
5 GTAATTGTAA GCATCCTGAA ATAAAAAAGC 

ATACAGATTT CTCTCCATAT CTGATTTCAG 
ATGCATCTCA GGTTTGTTCT GAGACACCTG 

10 

AAGATACTAG TTTTGCTGAA AATGACATTA 
TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA 
15 GTTACCGAAG AGGGGCCAAG AAATTAGAGT 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT 
CTACTAGGCA TAGCACCGTT GCTACCGAGT 

20 

TATCATTGAA GAATAGCTTA AATGACTGCA 
AGGAACATCA CCTTAGTGAG GAAACAAAAT 
25 GTGAATTGGA AGACTTGACT GCAAATACAA 

CCAAACAAAT GAGGCATCAG TCTGAAAGCC 
TTTCAGATGA TGAAGAAAGA GGAACGGGCT 

30 

TGGATTCAAA CTTAGGTGAA GCAGCATCTG 
ACTGCTCAGG GCTATCCTCT CAGAGTGACA 
35 AACATAACCT GATAAAGCTC CAGCAGGAAA 

ATGGGAGCCA GCCTTCTAAC AGCTACCCTT 
ACCTGCGAAA TCCAGAACAA AGCACATCAG 

40 

GTGAATACCC TATAAGCCAG AATCCAGAAG 
CAGATAGTTC TACCAGTAAA AATAAAGAAC 
45 GCCCATCATT AGATGATAGG TGGTACATGC 

ACTACCCATC TCAAGAGGAG CTCATTAAGG 
AGTCTGGGCC ACACGATTTG ACGGAAACAT 

50 

CCCCTTACCT GGAATCTGGA ATCAGCCTCT 
AAGACAGAGC CCCAGAGTCA GCTCGTGTTG 
55 AAGTTCCCCA ATTGAAAGTT GCAGAATCTG 

ATACTGCTGG GTATAATGCA ATGGAAGAAA 
CTTCAACAGA AAGGGTCAAC AAAAGAATGT 

60 

AATTTATGCT CGTGTACAAG TTTGCCAGAA 
CTGAAGAGAC TACTCATGTT GTTATGAAAA 
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AACTAGGTAG 


AAACAGAGGG 


CCAAAATTGA 


3360 


CTGAGGTCTA 


TAAACAAAGT 


CTTCCTGGAA 


3420 


AAGAATATGA 


AGAAGTAGTT 


CAGACTGTTA 


3480 


ATAACTTAGA 


ACAGCCTATG 


GGAAGTAGTC 


3540 


ATGACCTGTT 


AGATGATGGT 


GAAATAAAGG 


3600 


AGGAAAGTTC 


TGCTGTTTTT 


AGCAAAAGCG 


3660 


GCCCTTTCAC 


CCATACACAT 


TTGGCTCAGG 


3720 


CCTCAGAAGA GAACTTATCT AGTGAGGATG 


3780 


TTGGTAAAGT 


AAACAATATA 


CCTTCTCAGT 


3840 


GTCTGTCTAA 


GAACACAGAG 


GAGAATTTAT 


3900 


GTAACCAGGT 


AATATTGGCA 


AAGGCATCTC 


3960 


GTTCTGCTAG 


CTTGTTTTCT 


TCACAGTGCA 


4020 


ACACCCAGGA 


TCCTTTCTTG 


ATTGGTTCTT 


4060 


AGGGAGTTGG 


TCTGAGTGAC 


AAGGAATTGG 


4140 


TGGAAGAAAA 


TAATCAAGAA 


GAGCAAAGCA 


4200 


GGTGTGAGAG 


TGAAACAAGC 


GTCTCTGAAG 


4260 


TTTTAACCAC 


TCAGCAGAGG 


GATACCATGC 


4320 


TGGCTGAACT 


AGAAGCTGTG 


TTAGAACAGC 


4380 


CCATCATAAG 


TGACTCTTCT 


GCCCTTGAGG 


4440 


AAAAAGCAGT 


ATTAACTTCA 


CAGAAAAGTA 


4500 


GCCTTTCTGC 


TGACAAGTTT 


GAGGTGTCTG 


4560 


CAGGAGTGGA 


AAGGTCATCC 


CCTTCTAAAT 


4620 


ACAGTTGCTC 


TGGGAGTCTT 


CAGAATAGAA 


4680 


TTGTTGATGT 


GGAGGAGCAA 


CAGCTGGAAG 


4740 


CTTACTTGCC 


AAGGCAAGAT 


CTAGAGGGAA 


4800 


TCTCTGATGA 


CCCTGAATCT 


GATCCTTCTG 


4860 


GCAACATACC 


ATCTTCAACC 


TCTGCATTGA 


4920 


CCCAGAGTCC 


AGCTGCTGCT 


CATACTACTG 


4980 


GTGTGAGCAG 


GGAGAAGCCA 


GAATTGACAG 


5040 


CCATGGTGGT 


GTCTGGCCTG 


ACCCCAGAAG 


5100 


AACACCACAT 


CACTTTAACT 


AATCTAATTA 


5160 


CAGATGCTGA 


GTTTGTGTGT 


GAACGGACAC 


5220 
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TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 5280 
AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 5340 
TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 5400 
TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 5460 
AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 5520 
GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 5580 
TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5640 
15 GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 5700 

5711 

GCCACTACTG A 

(2) INFORMATION FOR SEQ ID NO: 5: 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
25 (O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
TGTCCTTAAA AGGTTGATAA TCACTTGCTG AGTGTGTTTC TCAAACAAGT TAATTTCAG 59 
(2) INFORMATION FOR SEQ ID NO: 6: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5710 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
45 AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 
TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 
TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 
ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 
55 GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 3 60 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 42 0 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 
AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 
AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

28 
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CTGTGAGAAC TCTGAGGACA AAGCAGCGGA 
AATTGGGATC TGATTCTTCT GAAGATACCG 
5 ATCAAGAATT GTTACAAATC ACCCCTCAAG 

CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA 
CCAGTAATAA TGATTTGAAC ACCACTGAGA 

10 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG 
GCTCATTACA GCATGAGAAC AGCAGTTTAT 
15 AGGCTGAATT CTGTAATAAA AGCAAACGCC 

GGCTGGAAGT AAGGAAACAT GTAATGATAG 
TCTGAATGCT GATCCCCTGT GTGAGAGAAA 

20 

AGAGAATCCT AGAGATACTG AAGATGTTCC 
AGTTAATGAG TGGTTTTCCA GAAGTGATGA 
25 GGAGTCTGAA TCAAATGCCA AAGTAGCTGA 

ATATTCTGGT TCTTCAGAGA AAATAGACTT 
ATGTAAAAGT GAAAGAGTTC ACTCCAAATC 

30 

TGGGAAAACC TATCGGAAGA AGGCAAGCCT 
AATTATAGGA GCATTTGTTA CTGAGCCACA 
35 ATTAAAGCGT AAAAGGAGAC CTACATCAGG 

AGATTTGGCA GTTCAAAAGA CTCCTGAAAT 
GAATGGTCAA GTGATGAATA TTACTAATAG 

40 

TATTCAGAAT GAGAAAAATC CTAACCCAAT 
AACGAAAGCT GAACCTATAA GCAGCAGTAT 
45 CAATTCAAAA GCACCTAAAA AGAATAGGCT 

TGCGCTTGAA CTAGTAGTCA GTAGAAATCT 
TGATAGTTGT TCTAGCAGTG AAGAGATAAA 

50 

GCACAGCAGA AACCTACAAC TCATGGAAGG 
TAACAAGCCA AATGAACAGA CAAGTAAAAG 
55 GTTAACAAAT GCACCTGGTT CTTTTACTAA 

TGTCAATCCT AGCCTTCCAA GAGAAGAAAA 
TAATAATGCT GAAGACCCCA AAGATCTCAT 

60 

AAGATCTGTA GAGAGTAGCA GTATTTCATT 
AAGTATCTCG TTACTGGAAG TTAGCACTCT 
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TACAACCTCA AAAGACGTCT 


GTCTACATTG 


660 


TTAATAAGGC 


AACTTATTGC 


AGTGTGGGAG 


720 


GAACCAGGGA 


TGAAATCAGT 


TTGGATTCTG 


780 


CGGATGTAAC 


AAATACTGAA 


CATCATCAAC 


840 


AGCGTGCAGC 


TGAGAGGCAT 


CCAGAAAAGT 


900 


TGGAGCCATG 


TGGCACAAAT 


ACTCATGCCA 


960 


TACTCACTAA AGACAGAATG 


AATGTAGAAA 


1020 


TGGCTTAGCA 


AGGAGCCAAC 


ATAACAGATG 


1080 


GCGGACTCCC 


AGCACAGAAA 


AAAAGGTAGA 


1140 


AGAATGGAAT 


AAGCAGAAAC 


TGCCATGCTC 


1200 


TTGGATAACA 


CTAAATAGCA 


GCATTCAGAA 


1260 


ACTGTTAGGT 


TCTGATGACT 


CACATGATGG 


1320 


TGTATTGGAC 


GTTCTAAATG 


AGGTAGATGA 


1380 


ACTGGCCAGT 


GATCCTCATG 


AGGCTTTAAT 


1440 


AGTAGAGAGT 


AATATTGAAG 


ACAAAATATT 


1500 


CCCCAACTTA AGCCATGTAA 


CTGAAAATCT 


1560 


GATAATACAA GAGCGTCCCC 


TCACAAATAA 


1620 


CCTTCATCCT 


GAGGATTTTA 


TCAAGAAAGC 


1680 


GATAAATCAG 


GGAACTAACC 


AAACGGAGCA 


1740 


TGGTCATGAG 


AATAAAACAA 


AAGGTGATTC 


1800 


AGAATCACTC 


GAAAAAGAAT 


CTGCTTTCAA 


1860 


AAGCAATATG 


GAACTCGAAT 


TAAATATCCA 


1920 


GAGGAGGAAG 


TCTTCTACCA 


GGCATATTCA 


1980 


AAGCCCACCT 


AATTGTACTG 


AATTGCAAAT 


2040 


GAAAAAAAAG 


TACAACCAAA 


TGCCAGTCAG 


2100 


TAAAGAACCT 


GCAACTGGAG 


CCAAGAAGAG 


2160 


ACATGACAGC 


GATACTTTCC 


CAGAGCTGAA 


2220 


GTGTTCAAAT 


ACCAGTGAAC 


TTAAAGAATT 


2280 


AGAAGAGAAA 


CTAGAAACAG 


TTAAAGTGTC 


2340 


GTTAAGTGGA GAAAGGGTTT 


TGCAAACTGA 


2400 


GGTACCTGGT 


ACTGATTATG 


GCACTCAGGA 


2460 


AGGGAAGGCA AAAACAGAAC 


CAAATAAATG 


2520 
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TGTGAGTCAG TGTGCAGCAT TTGAAAACCC CAAGGGACTA ATTCATGGTT GTTCCAAAGA 2580 

TAATAGAAAT GACACAGAAG GCTTTAAGTA TCCATTGGGA CATGAAGTTA ACCACAGTCG 2640 

GGAAACAAGC ATAGAAATGG AAGAAAGTGA ACTTGATGCT CAGTATTTGC AGAATACATT 2700 

CAAGGTTTCA AAGCGCCAGT CATTTGCTCC GTTTTCAAAT CCAGGAAATG CAGAAGAGGA 2760 

ATGTGCAACA TTCTCTGCCC ACTCTGGGTC CTTAAAGAAA CAAAGTCCAA AAGTCACTTT 2820 

TGAATGTGAA CAAAAGGAAG AAAATCAAGG AAAGAATGAG TCTAATATCA AGCCTGTACA 2880 

GACAGTTAAT ATCACTGCAG GCTTTCCTGT GGTTGGTCAG AAAGATAAGC CAGTTGATAA -2940 

15 TGCCAAATGT AGTATCAAAG GAGGCTCTAG GTTTTGTCTA TCATCTCAGT TCAGAGGCAA 3000 

CGAAACTGGA CTCATTACTC CAAATAAACA TGGACTTTTA CAAAACCCAT ATCGTATACC 3060 

ACCACTTTTT CCCATCAAGT CATTTGTTAA AACTAAATGT AAGAAAAATC TGCTAGAGGA 3120 

AAACTTTGAG GAACATTCAA TGTCACCTGA AAGAGAAATG GGAAATGAGA ACATTCCAAG 3180 

TACAGTGAGC ACAATTAGCC GTAATAACAT TAGAGAAAAT GTTTTTAAAG AAGCCAGCTC 3240 

25 AAGCAATATT AATGAAGTAG GTTCCAGTAC TAATGAAGTG GGCTCCAGTA TTAATGAAAT 3300 

AGGTTCCAGT GATGAAAACA TTCAAGCAGA ACTAGGTAGA AACAGAGGGC CAAAATTGAA 3360 

TGCTATGCTT AGATTAGGGG TTTTGCAACC TGAGGTCTAT AAACAAAGTC TTCCTGGAAG 3420 

TAATTGTAAG CATCCTGAAA TAAAAAAGCA AGAATATGAA GAAGTAGTTC AGACTGTTAA 3480 

TACAGATTTC TCTCCATATC TGATTTCAGA TAACTTAGAA CAGCCTATGG GAAGTAGTCA 3540 

35 TGCATCTCAG GTTTGTTCTG AGACACCTGA TGACCTGTTA GATGATGGTG AAATAAAGGA 3600 

AGATACTAGT TTTGCTGAAA ATGACATTAA GGAAAGTTCT GCTGTTTTTA GCAAAAGCGT 3660 

CCAGAAAGGA GAGCTTAGCA GGAGTCCTAG CCCTTTCACC CATACACATT TGGCTCAGGG 3720 

TTACCGAAGA GGGGCCAAGA AATTAGAGTC CTCAGAAGAG AACTTATCTA GTGAGGATGA 3780 

AGAGCTTCCC TGCTTCCAAC ACTTGTTATT TGGTAAAGTA AACAATATAC CTTCTCAGTC 3840 

45 TACTAGGCAT AGCACCGTTG CTACCGAGTG TCTGTCTAAG AACACAGAGG AGAATTTATT 3900 

ATCATTGAAG AATAGCTTAA ATCACTGCAG TAACCAGGTA ATATTGGCAA AGGCATCTCA 3960 

GGAACATCAC CTTAGTGAGG AAACAAAATG TTCTGCTAGC TTGTTTTCTT CACAGTGCAG 4020 

TGAATTGGAA GACTTGACTG CAAATACAAA CACCCAGGAT CCTTTCTTGA TTGGTTCTTC 4080 

CAAACAAATG AGGCATCAGT CTGAAAGCCA GGGAGTTGGT CTGAGTGACA AGGAATTGGT 4140 

55 TTCAGATGAT GAAGAAAGAG GAACGGGCTT GGAAGAAAAT AATCAAGAAG AGCAAAGCAT 4200 

GGATTCAAAC TTAGGTGAAG CAGCATCTGG GTGTGAGAGT GAAACAAGCG TCTCTGAAGA 4260 

CTGCTCAGGG CTATCCTCTC AGAGTGACAT TTTAACCACT CAGCAGAGGG ATACCATGCA 4320 

ACATAACCTG ATAAAGCTCC AGCAGGAAAT GGCTGAACTA GAAGCTGTGT TAGAACAGCA 4380 

TGGGAGCCAG CCTTCTAACA GCTACCCTTC CATCATAAGT GACTCTTCTG CCCTTGAGGA 4440 

30 



40 



50 



60 



10 



20 



30 



WO 96/33271 PCTAJS96/05621 

CCTGCGAAAT CCAGAACAAA GCACATCAGA AAAAGCAGTA TTAACTTCAC AGAAAAGTAG 4500 

TGAATACCCT ATAAGCCAGA ATCCAGAAGG CCTTTCTGCT GACAAGTTTG AGGTGTCTGC 4560 

AGATAGTTCT ACCAGTAAAA ATAAAGAACC AGGAGTGGAA AGGTCATCCC CTTCTAAATG 4620 

CCCATCATTA GATGATAGGT GGTACATGCA CAGTTGCTCT GGGAGTCTTC AGAATAGAAA 4680 

CTACCCATCT CAAGAGGAGC TCATTAAGGT TGTTGATGTG GAGGAGCAAC AGCTGGAAGA 4740 

GTCTGGGCCA CACGATTTGA CGGAAACATC TTACTTGCCA AGGCAAGATC TAGAGGGAAC 4800 

CCCTTACCTG GAATCTGGAA TCAGCCTCTT CTCTGATGAC CCTGAATCTG ATCCTTCTGA 4860 

15 AGACAGAGCC CCAGAGTCAG CTCGTGTTGG CAACATACCA TCTTCAACCT CTGCATTGAA 4920 

AGTTCCCCAA TTGAAAGTTG CAGAATCTGC CCAGAGTCCA GCTGCTGCTC ATACTACTGA 4980 

TACTGCTGGG TATAATGCAA TGGAAGAAAG TGTGAGCAGG GAGAAGCCAG AATTGACAGC 5040 

TTCAACAGAA AGGGTCAACA AAAGAATGTC CATGGTGGTG TCTGGCCTGA CCCCAGAAGA 5100 

ATTTATGCTC GTGTACAAGT TTGCCAGAAA ACACCACATC ACTTTAACTA ATCTAATTAC 5160 

25 TGAAGAGACT ACTCATGTTG TTATGAAAAC AGATGCTGAG TTTGTGTGTG AACGGACACT 5220 

GAAATATTTT CTAGGAATTG CGGGAGGAAA ATGGGTAGTT AGCTATTTCT GGGTGACCCA 5280 

GTCTATTAAA GAAAGAAAAA TGCTGAATGA GCATGATTTT GAAGTCAGAG GAGATGTGGT 5340 

CAATGGAAGA AACCACCAAG GTCCAAAGCG AGCAAGAGAA TCCCAGGACA GAAAGATCTT 5400 

CAGGGGGCTA GAAATCTGTT GCTATGGGCC CTTCACCAAC ATGCCCACAG ATCAACTGGA 5460 

35 ATGGATGGTA CAGCTGTGTG GTGCTTCTGT GGTGAAGGAG CTTTCATCAT TCACCCTTGG 5520 

CACAGGTGTC CACCCAATTG TGGTTGTGCA GCCAGATGCC TGGACAGAGG ACAATGGCTT 5580 

CCATGCAATT GGGCAGATGT GTGAGGCACC TGTGGTGACC CGAGAGTGGG TGTTGGACAG 5640 

TGTAGCACTC TACCAGTGCC AGGAGCTGGA CACCTACCTG ATACCCCAGA TCCCCCACAG 5700 

CCACTACTGA 5710 
45 (2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5709 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

.<ii) MOLECULE TYPE: CDNA 
55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 
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ACATATTTTG CAAATTTTGC ATGCTGAAAC 
GTCCTTTATG TAAGAATGAT ATAACCAAAA 
5 AACTTGTTGA AGAGCTATTG AAAATCATTT 

ATGCAAACAG CTATAATTTT GCAAAAAAGG 
AAGTTTCTAT CATCCAAAGT ATGGGCTACA 

10 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA 
CTGTGAGAAC TCTGAGGACA AAGCAGCGGA 
15 AATTGGGATC TGATTCTTCT GAAGATACCG 

ATCAAGAATT GTTACAAATC ACCCCTCAAG 
CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA 

20 

CCAGTAATAA TGATTTGAAC ACCACTGAGA 
ATCAGGGTAG TTCTGTTTCA AACTTGCATG 
25 GCTCATTACA GCATGAGAAC AGCAGTTTAT 

AGGCTGAATT CTGTAATAAA AGCAAACAGC 
GGGCTGGAAG TAAGGAAACA TGTAATGATA 

30 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA 
C AG AG AATC C TAGAGATACT GAAGATGTTC 
35 AAGTTAATGA GTGGTTTTCC AGAAGTGATG 

GGGAGTCTGA ATCAAATGCC AAAGTAGCTG 
AATATTCTGG TTCTTCAGAG AAAATAGACT 

40 

TATGTAAAAG TGAAAGAGTT CACTCCAAAT 
TTGGGAAAAC CTATCGGAAG AAGGCAAGCC 
45 TAATTATAGG AGCATTTGTT ACTGAGCCAC 

AATTAAAGCG TAAAAGGAGA CCTACATCAG 
CAGATTTGGC AGTTCAAAAG ACTCCTGAAA 

50 

AGAATGGTCA AGTGATGAAT ATTACTAATA 
CTATTCAGAA TGAGAAAAAT CCTAACCCAA 
55 AAACGAAAGC TGAACCTATA AGCAGCAGTA 

ACAATTCAAA AGCACCTAAA AAGAATAGGC 
ATGCGCTTGA AC TAGTAGTC AGTAGAAATC 

60 

TTGATAGTTG TTCTAGCAGT GAAGAGATAA 
GGCACAGCAG AAACCTACAA CTCATGGAAG 
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TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 

GTGCTTTTCA GCTTGACACA GGTTTGGAGT 420 

AAAATAACTC TCCTGAACAT CTAAAAGATG 480 
GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 
GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 
TACAACCTCA AAAGACGTCT GTCTACATTG 660 
TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 
GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 
CGGATGTAAC AAATACTGAA CATCATCAAC 840 
AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 
TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

CTGGCTTAGC AAGGAGCCAA CATAACAGAT 1080 

GGCGGACTCC CAGCACAGAA AAAAAGGTAG 1140 

AAGAATGGAA TAAGCAGAAA CTGCCATGCT 1200 

CTTGGATAAC ACTAAATAGC AGCATTCAGA 1260 

AACTGTTAGG TTCTGATGAC TCACATGATG 1320 

ATGTATTGGA CGTTCTAAAT GAGGTAGATG 1380 

TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 

CAGTAGAGAG TAATATTGAA GACAAAATAT 1500 

TCCCCAACTT AAGCCATGTA ACTGAAAATC 1560 

AGATAATACA AGAGCGTCCC CTCACAAATA 1620 

GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

TGATAAATCA GGGAACTAAC CAAACGGAGC 1740 

GTGGTCATGA GAATAAAACA AAAGGTGATT 1800 

TAGAATCACT CGAAAAAGAA TCTGCTTTCA 1860 

TAAGCAATAT GGAACTCGAA TTAAATATCC 1920 

TGAGGAGGAA GTCTTCTACC AGGCATATTC 1980 

TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

32 
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GTAACAAGCC AAATGAACAG ACAAGTAAAA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA 
5 TTGTCAATCC TAGCCTTCCA AGAGAAGAAA 

CTAATAATGC TGAAGACCCC AAAGATCTCA 
AAAGATCTGT AGAGTAGCAG TATTTCATTG 

10 

AGTATCTCGT TACTGGAAGT TAGCACTCTA 
GTGAGTCAGT GTGCAGCATT TGAAAACCCC 
15 AATAGAAATG ACACAGAAGG CTTTAAGTAT 

GAAACAAGCA TAGAAATGGA AGAAAGTGAA 
AAGGTTTCAA AGCGCCAGTC ATTTGCTCCG 

20 

TGTGCAACAT TCTCTGCCCA CTCTGGGTCC 
GAATGTGAAC AAAAGGAAGA AAATCAAGGA 
25 ACAGTTAATA TCACTGCAGG CTTTCCTGTG 

GCCAAATGTA GTATCAAAGG AGGCTCTAGG 
GAAACTGGAC TCATTACTCC AAATAAACAT 

30 

CCACTTTTTC CCATCAAGTC ATTTGTTAAA 
AACTTTGAGG AACATTCAAT GTCACCTGAA 
35 ACAGTGAGCA CAATTAGCCG TAATAACATT 

AGCAATATTA ATGAAGTAGG TTCCAGTACT 
GGTTCCAGTG ATGAAAACAT TCAAGCAGAA 

40 

GCTATGCTTA GATTAGGGGT TTTGCAACCT 
AATTGTAAGC ATCCTGAAAT AAAAAAGCAA 
45 ACAGATTTCT CTCCATATCT GATTTCAGAT 

GCATCTCAGG TTTGTTCTGA GACACCTGAT 
GATACTAGTT TTGCTGAAAA TGACATTAAG 

50 

CAGAAAGGAG AGCTTAGCAG GAGTCCTAGC 
TACCGAAGAG GGGCCAAGAA ATTAGAGTCC 
55 GAGCTTCCCT GCTTCCAACA CTTGTTATTT 

ACTAGGCATA GCACCGTTGC TACCGAGTGT 
TCATTGAAGA ATAGCTTAAA TGACTGCAGT 

60 

GAACATCACC TTAGTGAGGA AACAAAATGT 
GAATTGGA A 3 ACTTGACTGC AAATACAAAC 
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GACATGACAG 


CGATACTTTC 


CCAGAGCTGA 


2220 


AGTGTTCAAA 


TACCAGTGAA 


CTTAAAGAAT 


2280 


AAGAAGAGAA 


ACTAGAAACA 


GTTAAAGTGT 


2340 


TGTTAAGTGG 


AGAAAGGGTT 


TTGCAAACTG 


2400 


GTACCTGGTA 


CTGATTATGG 


CACTCAGGAA 


2460 


GGGAAGGCAA 


AAACAGAACC 


AAATAAATGT 


2520 


AAGGGACTAA 


TTCATGGTTG 


TTCCAAAGAT 


2580 


CCATTGGGAC 


ATGAAGTTAA 


CCACAGTCGG 


2640 


CTTGATGCTC 


AGTATTTGCA 


GAATACATTC 


2700 


TTTTCAAATC 


CAGGAAATGC 


AGAAGAGGAA 


2760 


TTAAAGAAAC 


AAAGTCCAAA 


AGTCACTTTT 


2820 


AAGAATGAGT 


CTAATATCAA 


GCCTGTACAG 


2880 


GTTGGTCAGA 


AAGATAAGCC 


AGTTGATAAT 


2940 


TTTTGTCTAT 


CATCTCAGTT 


CAGAGGCAAC 


3000 


GGACTTTTAC 


AAAACCCATA 


TCGTATACCA 


3060 


ACTAAATGTA 


AGAAAAATCT 


GCTAGAGGAA 


3120 


AGAGAAATGG 


GAAATGAGAA 


CATTCCAAGT 


3180 


AGAGAAAATG 


TTTTTAAAGA 


AGCCAGCTCA 


3240 


AATGAAGTGG 


GCTCCAGTAT 


TAATGAAATA 


3300 


CTAGGTAGAA 


ACAGAGGGCC 


AAAATTGAAT 


3360 


GAGGTCTATA 


AACAAAGTCT 


TCCTGGAAGT 


3420 


GAATATGAAG 


AAGTAGTTCA 


GACTGTTAAT 


3480 


AACTTAGAAC 


AGCCTATGGG 


AAGTAGTCAT 


3540 


GACCTGTTAG 


ATGATGGTGA 


AATAAAGGAA 


3600 


GAAAGTTCTG 


CTGTTTTTAG 


CAAAAGCGTC 


3660 


CCTTTCACCC 


ATACACATTT 


GGCTCAGGGT 


3720 


TCAGAAGAGA 


ACTTATCTAG 


TGAGGATGAA 


3780 


GGTAAAGTAA 


ACAATATACC 


TTLTCAfc* 1 


t q a n 


CTGTCTAAGA 


ACACAGAGGA 


GAATTTATTA 


3900 


AACCAGGTAA 


TATTGGCAAA 


GGCATCTCAG 


3960 


TCTGCTAGCT 


TGTTTTCTTC 


ACAGTGCAGT 


4020 


ACCCAGGATC 


CTTTCTTGAT 


TGGTTCTTCC 


4080 
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AAACAAATGA 


GGCATCAGTC 


TGAAAGCCAG 


GGAGTTGGTC 


TGAGTGACAA 


GGAATTGGTT 


4140 


TCAGATGATG 

X ^nwrV J. X w" 


AAGAAAGAGG 


AACGGGCTTG 


GAAGAAAATA 


ATCAAGAAGA 


GCAAAGCATG 


4200 


GATTPAAACT 


TAGGTGAAGC 


AGCATCTGGG 


TGTGAGAGTG 


AAACAAGCGT 


CTCTGAAGAC 


4260 


TGCTCAGGGC 


TATCCTCTCA 

i n x \~\- x \— x w *» 


GAGTGACATT 


TTAACCACTC 


AGCAGAGGGA 


TACCATGCAA 


4320 


CATAACCTGA 


TAAAGCTCCA 


GCAGGAAATG 


GCTGAACTAG 


AAGCTGTGTT 


AGAACAGCAT 


4380 


GGGAGCCAGC 


CTTCTAACAG 


CTACCCTTCC 


ATCATAAGTG 


ACTCTTCTGC 


CCTTGAGGAC 


4440 


CTGCGAAATC 


CAGAACAAAG 


CACATCAGAA 


AAAGCAGTAT 


TAACTTCACA 


GAAAAGTAGT 


-4500 


GAATACCCTA 


TAAGCCAGAA 


TCCAGAAGGC 


CTTTCTGCTG 


ACAAGTTTGA 


GGTGTCTGCA 


4560 


GATAGTTCTA 


CCAGTAAAAA 


TAAAGAACCA 


GGAGTGGAAA 


GGTCATCCCC 


TTCTAAATGC 


4620 


pp a tp a tt a n 

wwAX wrlX lAu 


ATGATAGGTG 


GTACATGCAC 


AGTTGCTCTG 


GGAGTCTTCA 


GAATAGAAAC 


4680 


lALLUniL 1 w 


a ap appappt 


PATTAAGGTT 

w/\ X X rtrtw'w' X X 


GTTGATGTGG 

\J A A A A WW 


AGGAGCAACA 


GCTGGAAGAG 


4740 




a p p a tttr & P 

AUuAi 1 Xw>Aw 


GG AAAP ATC T 

UUAAALA X w X 


TACTTGCCAA 

X «*W A A WWW^M* 


GGCAAGATCT 


AGAGGGAACC 


4800 


w w X lAww l<Jw• 


An 1 \- 1\WAA x 


PAPPPTPTfP 


TCTGATGACC 

x w A x wnw w 


CTGAATCTGA 


TCCTTCTGAA 


4860 


^" , ft/■ , ftP , I^/^^PP 

CjAC ACj Avj w U w 


wAwAv* 1 V-AvjL. 


TPPTPTTGPP 


AACATACCAT 


CTTCAACCTC 


TGCATTGAAA 


4920 


ul 1 LLLcnn X 


TGA AAP.TTGP 


AG AATP TGC C 


CAGAGTCCAG 


CTGCTGCTCA 


TACTACTGAT 


4980 


& PTPPTPP/^P 


AlAniw\.Ani 


GG AAGAAAGT 


GTGAGCAGGG 


AGAAGCCAGA 


ATTGACAGCT 


5040 


X ^AnUnvjAAn 


GGGTP AAPAA 


AAGAATGTCC 


ATGGTGGTGT 


CTGGCCTGAC 


CCCAGAAGAA 


5100 


tttatpptpp: 


TP.T & P A 2k P. TT 
x o x n\*.i\Jvji x x 


TGPPAGAAAA 


CACCACATCA CTTTAACTAA 


TCTAATTACT 


5160 


r* ft % /-«*/-» ft p»rp R 


PTPHTTTTTT 
w 1 \-J\ 1 V» X lo X 


TATPA A AAPA 
X A 1 w*/\/VfVrlw/\ 


GATGCTGAGT 


TTGTGTGTGA 


ACGGACACTG 


5220 


AAnlnl X X IL 


T APP. A ATTfiT 
XAw-wxViiX X uv 


GGGAGGAAAA 


TGGGTAGTTA GCTATTTCTG 


GGTGACCCAG 


5280 


rpp^ATm^ ft ftO 
TCTATTAAAO 


»a.APAiv2iAA r P 
AAAwAAAAAx 


PPTP A A TP. AG 
vLi oAA 1 «Ao 


CATGATTTTG 


AAGTCAGAGG 


AGATGTGGTC 


5340 


ft ftTW^ft ft o & & 


APPAPP A APP 


TPP A A AGPGA 
X W\«aAA\jV*UA 


GCAAGAGAAT CCCAGGACAG 


AAAGATCTTC 


5400 


AG*KSG<jwTAwj 


ft ft ft TPTPTTP 


PTATPPPPPP 

w lAiUUUwwv 


TTCACCAACA TGCCCACAGA TCAACTGGAA 


5460 


TGGATGGTAC 


AGCTGTGTGG 


TGCTTCTGTG 


GTGAAGGAGC 


TTTCATCATT 


CACCCTTGGC 


5520 


ACAGGTGTCC 


ACCCAATTGT 


GGTTGTGCAG 


CCAGATGCCT 


GGACAGAGGA 


CAATGGCTTC 


J JoU 


CATGCAATTG 


GGCAGATGTG 


TGAGGCACCT 


GTGGTGACCC 


GAGAGTGGGT 


GTTGGACAGT 


5640 


GTAGCACTCT 


ACCAGTGCCA 


GGAGCTGGAC 


ACCTACCTGA 


TACCCCAGAT 


CCCCCACAGC 


5700 


CACTACTGA 












5709 


(2) INFORMATION FOR SEQ ID NO: 8: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 



AGCTCGCTGA 


GACTTCCTGG 


ACCCCGCACC 


AGGCTGTGGG 


GTTTCTCAGA 


TAACTGGGCC 


60 


CCTGCGCTCA 


GGAGGCCTTC 


ACCCTCTGCT 


CTGGGTAAAG 


TTCATTGGAA 


CAGAAAGAAA 


120 


TGGATTTATC 


TGCTCTTCGC 


GTTGAAGAAG 


TACAAAATGT 


CATTAATGCT 


ATGCAGAAAA 


180 


TCTTAGAGTG 


TCCCATCTGT 


CTGGAGTTGA 


TCAAGGAACC 


TGTCTCCACA 


AAGTGTGACC 


240 


ACATATTTTG 


CAAATTTTGC 


ATGCTGAAAC 


TTCTCAACCA 


GAAGAAAGGG 


CCTTCACAGT 


300 


GTCCTTTATG 


TAAGAATGAT 


ATAACCAAAA 


GGAGCCTACA 


AGAAAGTACG 


AGATTTAGTC 


360 


AACTTGTTGA 


AGAGCTATTG 


AAAATCATTT 


GTGCTTTTCA 


GCTTGACACA 


GGTTTGGAGT 


420 


ATGCAAACAG 


CTATAATTTT 


GCAAAAAAGG 


AAAATAACTC 


TCCTGAACAT 


CTAAAAGATG 


480 


AAGTTTCTAT 


CATCCAAAGT 


ATGGGCTACA 


GAAACCGTGC 


CAAAAGACTT 


CTACAGAGTG 


540 


AACCCGAAAA 


TCCTTCCTTG 


CAGGAAACCA 


GTCTCAGTGT 


CCAACTCTCT 


AACCTTGGAA 


600 


CTGTGAGAAC 


TCTGAGGACA 


AAGCAGCGGA 


TACAACCTCA 


AAAGACGTCT 


GTCTACATTG 


660 


AATTGGGATC 


TGATTCTTCT 


GAAGATACCG 


TTAATAAGGC 


AACTTATTGC 


AGTGTGGGAG 


720 


ATCAAGAATT 


GTTACAAATC 


ACCCCTCAAG 


GAACCAGGGA 


TGAAATCAGT 


TTGGATTCTG 


780 


CAAAAAAGGC 


TGCTTGTGAA 


TTTTCTGAGA 


CGGATGTAAC 


AAATACTGAA 


CATCATCAAC 


840 


CCAGTAATAA 


TGATTTGAAC 


ACCACTGAGA 


AGCGTGCAGC 


TGAGAGGCAT 


CCAGAAAAGT 


900 


ATCAGGGTAG 


TTCTGTTTCA 


AACTTGCATG 


TGGAGCCATG 


TGGCACAAAT 


ACTCATGCCA 


960 


GCTCATTACA 


GCATGAGAAC 


AGCAGTTTAT 


TACTCACTAA 


AGACAGAATG 


AATGTAGAAA 


1020 


AGGCTGAATT 


CTGTAATAAA 


AGCAAACAGC 


CTGGCTTAGC 


AAGGAGCCAA 


CATAACAGAT 


1080 


GGGCTGGAAG 


TAAGGAAACA 


TGTAATGATA 


GGCGGACTCC 


CAGCACAGAA AAAAAGGTAG 


1140 


ATCTGAATGC 


TGATCCCCTG 


TGTGAGAGAA 


AAGAATGGAA 


TAAGCAGAAA 


CTGCCATGCT 


1200 


CAGAGAATCC 


TAGAGATACT 


GAAGATGTTC 


CTTGGATAAC 


ACTAAATAGC AGCATTCAGA 


1260 


AAGTTAATGA 


GTGGTTTTCC 


AGAAGTGATG 


AACTGTTAGG 


TTCTGATGAC 


TCACATGATG 


1320 


GGGAGTCTGA 


ATCAAATGCC 


AAAGTAGCTG 


ATGTATTGGA 


CGTTCTAAAT 


GAGGTAGATG 


1380 


AATATTCTGG 


TTCTTCAGAG 


AAAATAGACT 


TACTGGCCAG 


TGATCCTCAT 


GAGGCTTTAA 


1440 


TATGTAAAAG 


TGAAAGAGTT 


CACTCCAAAT 


CAGTAGAGAG 


TAATATTGAA 


GACAAAATAT 


1500 


TTGGGAAAAC 


CTATCGGAAG 


AAGGCAAGCC 


TCCCCAACTT 


AAGCCATGTA 


ACTGAAAATC 


1560 


TAATTATAGG 


AGCATTTGTT 


ACTGAGCCAC 


AGATAATACA 


AGAGCGTCCC 


CTCACAAATA 


1620 


AATTAAAGCG 


TAAAAGGAGA 


CCTACATCAG 


GCCTTCATCC 


TGAGGATTTT 


ATCAAGAAAG 


1680 


CAGATTTGGC 


AGTTCAAAAG 


ACTCCTGAAA 


TGATAAATCA 


GGGAACTAAC 


CAAACGGAGC 


1740 


AGAATGGTCA 


AGTGATGAAT 


ATTACTAATA 


GTGGTCATGA 


GAATAAAACA 


AAAGGTGATT 


1800 
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CTATTCAGAA TGAGAAAAAT CCTAACCCAA 
AAACGAAAGC TGAACCTATA AGCAGCAGTA 
5 ACAATTCAAA AGCACCTAAA AAGAATAGGC 

ATGCGCTTGA ACTAGTAGTC AGTAGAAATC 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA 

10 

GGCACAGCAG AAACCTACAA CTCATGGAAG 
GTAACAAGCC AAATGAACAG ACAAGTAAAA 
15 AGTTAACAAA TGCACCTGGT TCTTTTACTA 

TTGTCAATCC TAGCCTTCCA AGAGAAGAAA 
CTAATAATGC TGAAGACCCC AAAGATCTCA 

20 

AAAGATCTGT AGAGAGTAGC AGTATTTCAT 
AAAGTATCTC GTTACTGGAA GTTAGCACTC 
25 GTGTGAGTCA GTGTGCAGCA TTTGAAAACC 

ATAATAGAAA TGACACAGAA GGCTTTAAGT 
GGGAAACAAG CATAGAAATG GAAGAAAGTG 

30 

TCAAGGTTTC AAAGCGCCAG TCATTTGCTC 
AATGTGCAAC ATTCTCTGCC CACTCTGGGT 
35 GAATGTGAAC AAAAGGAAGA AAATCAAGGA 

ACAGTTAATA TCACTGCAGG CTTTCCTGTG 
GCCAAATGTA GTATCAAAGG AGGCTCTAGG 

40 

GAAACTGGAC TCATTACTCC AAATAAACAT 
CCACTTTTTC CCATCAAGTC ATTTGTTAAA 
45 AACTTTGAGG AACATTCAAT GTCACCTGAA 

ACAGTGAGCA CAATTAGCCG TAATAACATT 
AGCAATATTA ATGAAGTAGG TTCCAGTACT 

50 

GGTTCCAGTG ATGAAAACAT TCAAGCAGAA 
GCTATGCTTA GATTAGGGGT TTTGCAACCT 
55 AATTGTAAGC ATCCTGAAAT AAAAAAGCAA 

ACAGATTTCT CTCCATATCT GATTTCAGAT 
GCATCTCAGG TTTGTTCTGA GACACCTGAT 

60 

GATACTAGTT TTGCTGAAAA TGACATTAAG 
CAGAAAGGAG AGCTTAGCAG GAGTCCTAGC 
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TAGAATCACT CGAAAAAGAA TCTGCTTTCA 1860 

TAAGCAATAT GGAACTCGAA TTAAATATCC 1920 

TGAGGAGGAA GTCTTCTACC AGGCATATTC 1980 

TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

GACATGACAG CGATACTTTC CCAGAGCTGA 2220 

AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 2280 

AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 2340 

TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 2400 

TGGTACCTGG TACTGATTAT GGCACTCAGG 2460 

TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2520 

CCAAGGGACT AATTCATGGT TGTTCCAAAG 2580 

ATCCATTGGG ACATGAAGTT AACCACAGTC 2640 

AACTTGATGC TCAGTATTTG CAGAATACAT 2700 

CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 2760 

CCTTAAAGAC AAAGTCCAAA AGTCACTTTT 2820 

AAGAATGAGT CTAATATCAA GCCTGTACAG 2880 

GTTGGTCAGA AAGATAAGCC AGTTGATAAT 2940 

TTTTGTCTAT CATCTCAGTT CAGAGGCAAC 3000 

GGACTTTTAC AAAACCCATA TCGTATACCA 3 060 

ACTAAATGTA AGAAAAATCT GCTAGAGGAA 3120 

AGAGAAATGG GAAATGAGAA CATTCCAAGT 3180 

AGAGAAAATG TTTTTAAAGA AGCCAGCTCA 3240 

AATGAAGTGG GCTCCAGTAT TAATGAAATA 3300 

CTAGGTAGAA ACAGAGGGCC AAAATTGAAT 33 60 

GAGGTCTATA AACAAAGTCT TCCTGGAAGT 3420 
GAATATGAAG AAGTAGTTCA GACTGTTAAT 3480 
AACTTAGAAC AGCCTATGGG AAGTAGTCAT 3540 
GACCTGTTAG ATGATGGTGA AATAAAGGAA 3600 
GAAAGTTCTG CTGTTTTTAG CAAAAGCGTC 3660 
CCTTTCACCC ATACACATTT GGCTCAGGGT 3720 
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TACCGAAGAG 


GGGCCAAGAA 


ATTAGAGTCC 


TCAGAAGAGA ACTTATCTAG 


TGAGGATGAA 


3780 


GAGCTTCCCT 


GCTTCCAACA 


CTTGTTATTT 


GGTAAAGTAA ACAATATACC 


TTCTCAGTCT 


3840 


ACTAGGCATA 


GCACCGTTGC 


TACCGAGTGT 


CTGTCTAAGA 


ACACAGAGGA 


GAATTTATTA 


3900 


TCATTGAAGA 


ATAGCTTAAA 


TGACTGCAGT 


AACCAGGTAA 


TATTGGCAAA 


GGCATCTCAG 


3960 


GAACATCACC 


TTAGTGAGGA 


AACAAAATGT 


TCTGCTAGCT 


TGTTTTCTTC 


ACAGTGCAGT 


4020 


GAATTGGAAG 


ACTTGACTGC 


AAATACAAAC 


ACCCAGGATC 


CTTTCTTGAT 


TGGTTCTTCC 


4080 


AAACAAATGA 


GGCATCAGTC 


TGAAAGCCAG 


GGAGTTGGTC 


TGAGTGACAA 


GGAATTGGTT 


4140 


TCAGATGATG 


AAGAAAGAGG 


AACGGGCTTG 


GAAGAAAATA 


ATCAAGAAGA 


GCAAAGCATG 


4200 


GATTCAAACT 


TAGGTGAAGC 


AGCATCTGGG 


TGTGAGAGTG 


AAACAAGCGT 


CTCTGAAGAC 


4260 


TGCTCAGGGC 


TATCCTCTCA 


GAGTGACATT 


TTAACCACTC 


AGCAGAGGGA 


TACCATGCAA 


4320 


CATAACCTGA 


TAAAGCTCCA 


GCAGGAAATG 


GCTGAACTAG 


AAGCTGTGTT 


AGAACAGCAT 


4380 


GGGAGCCAGC 


CTTCTAACAG 


CTACCCTTCC 


ATCATAAGTG 


ACTCTTCTGC 


CCTTGAGGAC 


4440 


CTGCGAAATC 


CAGAACAAAG 


CACATCAGAA 


AAAGCAGTAT 


TAACTTCACA 


GAAAAGTAGT 


4500 


GAATACCCTA 


TAAGCCAGAA 


TCCAGAAGGC 


CTTTCTGCTG 


ACAAGTTTGA 


GGTGTCTGCA 


4560 


GATAGTTCTA 


CCAGTAAAAA 


TAAAGAACCA 


GGAGTGGAAA 


GGTCATCCCC 


TTCTAAATGC 


4620 


CCATCATTAG 


ATGATAGGTG 


GTACATGCAC 


AGTTGCTCTG 


GGAGTCTTCA 


GAATAGAAAC 


4680 


TACCCATCTC 


AAGAGGAGCT 


CATTAAGGTT 


GTTGATGTGG 


AGGAGCAACA 


GCTGGAAGAG 


4740 


TCTGGGCCAC 


ACGATTTGAC 


GGAAACATCT 


TACTTGCCAA 


GGCAAGATCT 


AGAGGGAACC 


4800 


CCTTACCTGG 


AATCTGGAAT 


CAGCCTCTTC 


TCTGATGACC 


CTGAATCTGA 


TCCTTCTGAA 


4860 


GACAGAGCCC 


CAGAGTCAGC 


TCGTGTTGGC 


AACATACCAT 


CTTCAACCTC 


TGCATTGAAA 


4920 


GTTCCCCAAT 


TGAAAGTTGC 


AGAATCTGCC 


CAGAGTCCAG 


CTGCTGCTCA 


TACTACTGAT 


4980 


ACTGCTGGGT 


ATAATGCAAT 


GGAAGAAAGT 


GTGAGCAGGG AGAAGCCAGA 


ATTGACAGCT 


5040 


TCAACAGAAA 


GGGTCAACAA 


AAGAATGTCC 


ATGGTGGTGT 


CTGGCCTGAC 


CCCAGAAGAA 


5100 


TTTATGCTCG 


TGTACAAGTT 


TGCCAGAAAA 


CACCACATCA 


CTTTAACTAA 


TCTAATTACT 


5160 


GAAGAGACTA 


CTCATGTTGT 


TATGAAAACA 


GATGCTGAGT 


TTGTGTGTGA 


ACGGACACTG 


5220 


AAATATTTTC 


TAGGAATTGC 


GGGAGGAAAA 


TGGGTAGTTA 


GCTATTTCTG 


GGTGACCCAG 


5280 


TCTATTAAAG 


AAAGAAAAAT 


GCTGAATGAG 


CATGATTTTG 


AAGTCAGAGG 


AGATGTGGTC 


5340 


AATGGAAGAA 


ACCACCAAGG 


TCCAAAGCGA 


GCAAGAGAAT 


CCCAGGACAG 


AAAGATCTTC 


5400 


AGGGGGCTAG 


AAATCTGTTG 


CTATGGGCCC 


TTCACCAACA TGCCCACAGA 


TCAACTGGAA 


5460 


TGGATGGTAC 


AGCTGTGTGG 


TGCTTCTGTG 


GTGAAGGAGC 


TTTCATCATT 


CACCCTTGGC 


5520 


ACAGGTGTCC 


ACCCAATTGT 


GGTTGTGCAG 


CCAGATGCCT 


GGACAGAGGA 


CAATGGCTTC 


5580 


CATGCAATTG 


GGCAGATGTG 


TGAGGCACCT 


GTGGTGACCC 


GAGAGTGGGT 


GTTGGACAGT 


5640 
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GTAGCACTCT ACCAGTGCCA GGAGCTGGAC ACCTACCTGA TACCCCAGAT CCCCCACAGC 5700 

5709 

CACTACTGA 

5 (2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5709 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

20 TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

25 ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 420 

30 ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 

35 AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 

40 ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

45 CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

50 AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 1080 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 1140 

55 ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 1200 

CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 1260 

AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 1320 

60 GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 1380 

AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 
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TATGTAAAAG 


TGAAAGAGTT 


CACTCCAAAT 




TTGGGAAAAC 


CTATCGGAAG 


AAGGCAAGCC 


5 


TAATTATAGG 


AGCATTTGTT 


ACTGAGCCAC 




AATTAAAGCG 


TAAAAGGAGA 


CCTACATCAG 


10 


CAGATTTGGC 
AGAATGGTCA 


AGTTCAAAAG 
AGTGATGAAT 


ACTCCTGAAA 
ATTACTAATA 




CTATTCAGAA 


TGAGAAAAAT 


CCTAACCCAA 


15 


AAACGAAAGC 


TGAACCTATA 


AGCAGCAGTA 




ACAATTrVAAA 


AGCACCTAAA 


AAGAATAGGC 


20 


ATGPGPTTGA 
TTGATAGTTG 


ACTAGTAGTC 
TTCTAGCAGT 


AGTAGAAATC 
GAAGAGATAA 




GGCACAGCAG 


AAACCTACAA 


CTCATGGAAG 


25 


GTAACAAGCC 


AAATGAACAG 


ACAAGTAAAA 




AGTTAACAAA 


TGCACCTGGT 


TCTTTTACTA 


30 


TTGTCAATCC 
CTAATAATGC 


TAGCCTTCCA 
TGAAGACCCC 


AGAGAAGAAA 
AAAGATCTCA 




AAAGATCTGT 


AGAGAGTAGC 


AGTATTTCAT 


35 


AAAGTATPTC 


GTTACTGGAA 


GTTAGCACTC 






GTGTGCAGCA 


TTTGAAAACC 


40 


A X nn x 4\uvvrVft 

GGGAAACAAG 


TGACACAGAA 
CATAGAAATG 


GGC TTTAAGT 
GAAGAAAGTG 




TV* A A 1 T '•'t'f** 
X \—r\Jr\\j\J UiV. 


AAAGPGPPAG 


TC ATTTGC TC 


45 


AATGTGfAAC 


ATTCTCTGCC 

/» X X N» X \a X VJv v» 


CACTCTGGGT 




TTGAATGTGA 

X x unn x v* x w» 


ACAAAAGGAA 


GAAAATCAAG 


50 


ACAGTTAATA 
GCCAAATGTA 


TCACTGCAGG 
GTATCAAAGG 


CTTTCCTGTG 
AGGCTCTAGG 




GAAACTGGAC 


TCATTACTCC 


AAATAAACAT 


55 


CCACTTTTTC 


CCATCAAGTC 


ATTTGTTAAA 




AACTTTGAGG 


AACATTCAAT 


GTCACCTGAA 


60 


ACAGTGAGCA 
AGCAATATTA 


CAATTAGCCG 
ATGAAGTAGG 


TAATAACATT 
TTCCAGTACT 




GGTTCCAGTG 


ATGAAAACAT 


TCAAGCAGAA 
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CAGTAGAGAG TAATATTGAA 


GACAAAATAT 


1500 


TCCCCAACTT AAGCCATGTA 


ACTGAAAATC 


1560 


AGATAATACA AGAGCGTCCC 


CTCACAAATA 


1620 


GCCTTCATCC TGAGGATTTT 


ATCAAGAAAG 


1680 


TGATAAATCA GGGAACTAAC 


CAAACGGAGC 


1740 


GTGGTCATGA GAATAAAACA 


AAAGGTGATT 


1800 


TAGAATCACT CGAAAAAGAA 


TCTGCTTTCA 


1860 


TAAGCAATAT GGAACTCGAA 


TTAAATATCC 


1920 


TGAGGAGGAA GTCTTCTACC 


AGGCATATTC 


1980 


TAAGCCCACC TAATTGTACT 


GAATTGCAAA 


2040 


AGAAAAAAAA GTACAACCAA 


ATGCCAGTCA 


2100 


GTAAAGAACC TGCAACTGGA 


GCCAAGAAGA 


2160 


GACATGACAG CGATACTTTC 


CCAGAGCTGA 


2220 


AGTGTTCAAA TACCAGTGAA 


CTTAAAGAAT 


2280 


AAGAAGAGAA ACTAGAAACA 


GTTAAAGTGT 


2340 


TGTTAAGTGG AGAAAGGGTT 


TTGCAAACTG 


2400 


TGGTACCTGG TACTGATTAT 


GGCACTCAGG 


2460 


TAGGGAAGGC AAAAACAGAA 


CCAAATAAAT 


2520 


CCAAGGGACT AATTCATGGT 


TGTTCCAAAG 


2580 


ATCCATTGGG ACATGAAGTT 


AACCACAGTC 


2640 


AACTTGATGC TCAGTATTTG 


CAGAATACAT 


2700 


CGTTTTCAAA TCCAGGAAAT 


GCAGAAGAGG 


2760 


CCTTAAAGAA ACAAAGTCCA 


AAAGTCACTT 


2820 


GAAAGAATGA GTAATATCAA 


GCCTGTACAG 


2880 


GTTGGTCAGA AAGATAAGCC 


AGTTGATAAT 


2940 


TTTTGTCTAT CATCTCAGTT 


CAGAGGCAAC 


3000 


GGACTTTTAC AAAACCCATA 


TCGTATACCA 


3060 


iPT'lMTrTl AGAAAAATCT 


GCTAGAGGAA 


3120 


AGAGAAATGG GAAATGAGAA 


CATTCCAAGT 


3180 


AGAGAAAATG TTTTTAAAGA 


AGCCAGCTCA 


3240 


AATGAAGTGG GCTCCAGTAT 


TAATGAAATA 


3300 


CTAGGTAGAA ACAGAGGGCC 


AAAATTGAAT 


3360 
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GCTATGCTTA GATTAGGGGT TTTGCAACCT GAGGTCTATA AACAAAGTCT TCCTGGAAGT 3420 

AATTGTAAGC ATCCTGAAAT AAAAAAGCAA GAATATGAAG AAGTAGTTCA GACTGTTAAT 3480 

ACAGATTTCT CTCCATATCT GATTTCAGAT AACTTAGAAC AGCCTATGGG AAGTAGTCAT 3540 

GCATCTCAGG TTTGTTCTGA GACACCTGAT GACCTGTTAG ATGATGGTGA AATAAAGGAA 3600 

GATACTAGTT TTGCTGAAAA TGACATTAAG GAAAGTTCTG CTGTTTTTAG CAAAAGCGTC 3660 

CAGAAAGGAG AGCTTAGCAG GAGTCCTAGC CCTTTCACCC ATACACATTT GGCTCAGGGT 3720 

TACCGAAGAG GGGCCAAGAA ATTAGAGTCC TCAGAAGAGA ACTTATCTAG TGAGGATGAA 3780 

15 GAGCTTCCCT GCTTCCAACA CTTGTTATTT GGTAAAGTAA ACAATATACC TTCTCAGTCT 3840 

ACTAGGCATA GCACCGTTGC TACCGAGTGT CTGTCTAAGA ACACAGAGGA GAATTTATTA 3900 

TCATTGAAGA ATAGCTTAAA TGACTGCAGT AACCAGGTAA TATTGGCAAA GGCATCTCAG 3960 

GAACATCACC TTAGTGAGGA AACAAAATGT TCTGCTAGCT TGTTTTCTTC ACAGTGCAGT 4020 

GAATTGGAAG ACTTGACTGC AAATACAAAC ACCCAGGATC CTTTCTTGAT TGGTTCTTCC 4080 

25 AAACAAATGA GGCATCAGTC TGAAAGCCAG GGAGTTGGTC TGAGTGACAA GGAATTGGTT 4140 

TCAGATGATG AAGAAAGAGG AACGGGCTTG GAAGAAAATA ATCAAGAAGA GCAAAGCATG 4200 

GATTCAAACT TAGGTGAAGC AGCATCTGGG TGTGAGAGTG AAACAAGCGT CTCTGAAGAC 4260 

TGCTCAGGGC TATCCTCTCA GAGTGACATT TTAACCACTC AGCAGAGGGA TACCATGCAA 4320 

CATAACCTGA TAAAGCTCCA GCAGGAAATG GCTGAACTAG AAGCTGTGTT AGAACAGCAT 4380 

35 GGGAGCCAGC CTTCTAACAG CTACCCTTCC ATCATAAGTG ACTCTTCTGC CCTTGAGGAC 4440 

CTGCGAAATC CAGAACAAAG CACATCAGAA AAAGCAGTAT TAACTTCACA GAAAAGTAGT 4500 

GAATACCCTA TAAGCCAGAA TCCAGAAGGC CTTTCTGCTG ACAAGTTTGA GGTGTCTGCA 4560 

GATAGTTCTA CCAGTAAAAA TAAAGAACCA GGAGTGGAAA GGTCATCCCC TTCTAAATGC 4620 

CCATCATTAG ATGATAGGTG GTACATGCAC AGTTGCTCTG GGAGTCTTCA GAATAGAAAC 4680 

45 TACCCATCTC AAGAGGAGCT CATTAAGGTT GTTGATGTGG AGGAGCAACA GCTGGAAGAG 4740 

TCTGGGCCAC ACGATTTGAC GGAAACATCT TACTTGCCAA GGCAAGATCT AGAGGGAACC 4 BOO 

CCTTACCTGG AATCTGGAAT CAGCCTCTTC TCTGATGACC CTGAATCTGA TCCTTCTGAA 4860 

GACAGAGCCC CAGAGTCAGC TCGTGTTGGC AACATACCAT CTTCAACCTC TGCATTGAAA 4920 

GTTCCCCAAT TGAAAGTTGC AGAATCTGCC CAGAGTCCAG CTGCTGCTCA TACTACTGAT 4980 

55 ACTGCTGGGT ATAATGCAAT GGAAGAAAGT GTGAGCAGGG AGAAGCCAGA ATTGACAGCT 5040 

TCAACAGAAA GGGTCAACAA AAGAATGTCC ATGGTGGTGT CTGGCCTGAC CCCAGAAGAA 5100 

TTTATGCTCG TGTACAAGTT TGCCAGAAAA CACCACATCA CTTTAACTAA TCTAATTACT 5160 

GAAGAGACTA CTCATGTTGT TATGAAAACA GATGCTGAGT TTGTGTGTGA ACGGACACTG 5220 
AAATATTTTC TAGGAATTGC GGGAGGAAAA TGGGTAGTTA GCTATTTCTG GGTGACCCAG 5280 
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TCTATTAAAG AAAGAAAAAT GCTGAATGAG CATGATTTTG AAGTCAGAGG AGATGTGGTC 5340 

AATGGAAGAA ACCACCAAGG TCCAAAGCGA GCAAGAGAAT CCCAGGACAG AAAGATCTTC 5400 

AGGGGGCTAG AAATCTGTTG CTATGGGCCC TTCACCAACA TGCCCACAGA TCAACTGGAA 5460 

TGGATGGTAC AGCTGTGTGG TGCTTCTGTG GTGAAGGAGC TTTCATCATT CACCCTTGGC 5520 

ACAGGTGTCC ACCCAATTGT GGTTGTGCAG CCAGATGCCT GGACAGAGGA CAATGGCTTC 5580 

CATGCAATTG GGCAGATGTG TGAGGCACCT GTGGTGACCC GAGAGTGGGT GTTGGACAGT 5640 

GTAGCACTCT ACCAGTGCCA GGAGCTGGAC ACCTACCTGA TACCCCAGAT CCCCCACAGC 5700 
15 CACTACTGA 

(2) INFORMATION FOR SEQ ID NO: 10: 
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(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

25 <ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

35 TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 420 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

45 AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

55 CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 90 0 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 108 0 
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GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 
ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 
CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 
AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 
GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 
AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 
TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 
TTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 
TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 
AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 
CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 
AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 
CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 
AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 
ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 
ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 
GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 
GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 
TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 
CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 
AAAGATCTGT AGAGAGTAGC AGTATTTCAT TGGTACCTGG TACTGATTAT GGCACTCAGG 
AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 
GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTCCAAAG 
ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 
GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 
TCAAGGTTTC AAAGCGCCAG TCATTTGCTC CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 
AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 
TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 
AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 
ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 
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ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 
CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 
AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 
GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 
CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 
TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 
ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 
GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 
ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 
ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 
AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 
TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 
GTTACTGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 
AAGAGCTTCC CTCCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 
CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 
TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 
AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 
GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 
CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 
TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 
TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 
ACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC TCAGCAGAGG GATACCATGC 
AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 
ATGGGAGCCA GCCTTCTAAC AGCTACCCTT CCATCATAAG TGACTCTTCT GCCCTTGAGG 
ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA CAGAAAAGTA 
GTGAATACCC TATAAGCCAG AATCCAGAAG GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 
CAGATAGTTC TACCAGTAAA AATAAAGAAC CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 
GCCCATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 
ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 
AGTCTGGGCC ACACGATTTG ACGGAAACAT CTTACTTGCC AAGGCAAGAT CTAGAGGGAA 
CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA CCCTGAATCT GATCCTTCTG 
AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA 
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AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGAGTCC AGCTGCTGCT CATACTACTG 
ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG 
CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 
AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 
CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 
TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 
AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 
TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 
TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 
AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 
GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 
TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 
GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 
GCCACTACTG A 

(2) INFORMATION FOR SEQ ID NO: 11: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5707 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 
CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 
TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 
TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 
ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 
GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 
AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 
ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 
AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 
AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 
CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 
AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 
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ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 
CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 

5 CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

10 AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 1080 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 1140 

15 ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 1200 

CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 1260 

AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 1320 

20 GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 1380 

AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 

25 TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 1500 

TTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 1560 

TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 1620 

30 AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 1740 

35 AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 1800 

CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 1860 

AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 1920 

40 ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 1980 

ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

45 TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 2220 

AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 2280 

TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 2340 

55 CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 2400 

AAAGATCTGT AGAGAGTAGC AGTATTTCAT TGGTACCTGG TACTGATTAT GGCACTCAGG 2460 
AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2520 
GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTCCAAAG 2580 
ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2640 
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GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 
TCAAGGTTTC AAAGCGCCAG TCATTTGCTC CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 
AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 
TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 
AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 
ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 
ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 
CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 
AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 
GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 
CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 
TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 
ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 
GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 
ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 
ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 
AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 
TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 
GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 
AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 
CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 
TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 
AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 
GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 
CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 
TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAAGAAGAGC AAAGCATGGA 
TTCAAACTTA GGTGAAGCAG CATCTGGGTG TGAGAGTGAA ACAAGCGTCT CTGAAGACTG 
CTCAGGGCTA TCCTCTCAGA GTGACATTTT AACCACTCAG CAGAGGGATA CCATGCAACA 
TAACCTGATA AAGCTCCAGC AGGAAATGGC TGAACTAGAA GCTGTGTTAG AACAGCATGG 
GAGCCAGCCT TCTAACAGCT ACCCTTCCAT CATAAGTGAC TCTTCTGCCC TTGAGGACCT 
GCGAAATCCA GAACAAAGCA CATCAGAAAA AGCAGTATTA ACTTCACAGA AAAGTAGTGA 
ATACCCTATA AGCCAGAATC CAGAAGGCCT TTCTGCTGAC AAGTTTGAGG TGTCTGCAGA 
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TAGTTCTACC AGTAAAAATA AAGAACCAGG AGTGGAAAGG TCATCCCCTT CTAAATGCCC 4620 

ATCATTAGAT GATAGGTGGT ACATGCACAG TTGCTCTGGG AGTCTTCAGA ATAGAAACTA 4680 

5 CCCATCTCAA GAGGAGCTCA TTAAGGTTGT TGATGTGGAG GAGCAACAGC TGGAAGAGTC 4740 

TGGGCCACAC GATTTGACGG AAACATCTTA CTTGCCAAGG CAAGATCTAG AGGGAACCCC 4800 

TTACCTGGAA TCTGGAATCA GCCTCTTCTC TGATGACCCT GAATCTGATC CTTCTGAAGA 4860 

10 CAGAGCCCCA GAGTCAGCTC GTGTTGGCAA CATACCATCT TCAACCTCTG CATTGAAAGT 4920 

TCCCCAATTG AAAGTTGCAG AATCTGCCCA GAGTCCAGCT GCTGCTCATA CTACTGATAC 4980 

15 TGCTGGGTAT AATGCAATGG AAGAAAGTGT GAGCAGGGAG AAGCCAGAAT TGACAGCTTC 5040 

AACAGAAAGG GTCAACAAAA GAATGTCCAT GGTGGTGTCT GGCCTGACCC CAGAAGAATT 5100 

TATGCTCGTG TACAAGTTTG CCAGAAAACA CCACATCACT TTAACTAATC TAATTACTGA 5160 

20 AGAGACTACT CATGTTGTTA TGAAAACAGA TGCTGAGTTT GTGTGTGAAC GGACACTGAA 5220 

ATATTTTCTA GGAATTGCGG GAGGAAAATG GGTAGTTAGC TATTTCTGGG TGACCCAGTC 5280 

25 TATTAAAGAA AGAAAAATGC TGAATGAGCA TGATTTTGAA GTCAGAGGAG ATGTGGTCAA 5340 

TGGAAGAAAC CACCAAGGTC CAAAGCGAGC AAGAGAATCC CAGGACAGAA AGATCTTCAG 5400 

GGGGCTAGAA ATCTGTTGCT ATGGGCCCTT CACCAACATG CCCACAGATC AACTGGAATG 5460 

30 GATGGTACAG CTGTGTGGTG CTTCTGTGGT GAAGGAGCTT TCATCATTCA CCCTTGGCAC 5520 

AGGTGTCCAC CCAATTGTGG TTGTGCAGCC AGATGCCTGG ACAGAGGACA ATGGCTTCCA 5580 

35 TGCAATTGGG CAGATGTGTG AGGCACCTGT GGTGACCCGA GAGTGGGTGT TGGACAGTGT 5640 

AGCACTCTAC CAGTGCCAGG AGCTGGACAC CTACCTGATA CCCCAGATCC CCCACAGCCA 5700 

5707 

CTACTGA 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5712 base pairs 
45 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

55 CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 
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AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 
ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 
AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 
AACCCGAAAA TCC^CCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 
CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 
AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 
ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 
CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 
CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 
ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 
GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 
AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 
GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 
ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 
CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 
AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 
GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 
AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 
TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 
TTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 
TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 
AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 
CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 
AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 
CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 
AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 
ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 
ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 
GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 
GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 
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TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 
CTAATAATGC TGAAGAC CCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 
AAAGATCTGT AGAGAGTAGC AGTATTTCAT TGGTACCTGG TACTGATTAT GGCACTCAGG 
AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 
GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT. AATTCATGGT TGTTCCAAAG 
ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 
GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 
TCAAGGTTTC AAAGCGCCAG TCATTTGCTC CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 
AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 
TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 
AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 
ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 
ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 
CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 
AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 
GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 
CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 
TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 
ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 
GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 
ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 
ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 
AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 
TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 
GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 
AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 
CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 
TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 
AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 
GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 
CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 
TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 
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TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG 
ACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC 
AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT 
ATGGGAGCCA GCCTTCTAAC AGCTACCCTT CCATCATAAG 
ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT 
GTGAATACCC TATAAGCCAG AATCCAGAAG GCCTTTCTGC 
CAGATAGTTC TACCAGTAAA AATAAAGAAC CAGGAGTGGA 
GCCCATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC 
ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT 
AGTCTGGGCC ACACGATTTG ACGGAAACAT CTTACTTGCC 
CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA 
AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC 
AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGAGTCC 
ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG 
CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT 
AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT 
CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA 
TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT 
AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT 
TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA 
TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA 
AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA 
GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC 
TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC 
GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTAACC 
AGCCACTACT GA 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 



TGAAACAAGC 
TCAGCAGAGG 
AGAAGCTGTG 
TGACTCTTCT 
ATTAACTTCA 
TGACAAGTTT 
AAGGTCATCC 
TGGGAGTCTT 
GGAGGAGCAA 
AAGGCAAGAT 
CCCTGAATCT 
ATCTTCAACC 
AGCTGCTGCT 
GGAGAAGCCA 
GTCTGGCCTG 
CACTTTAACT 
GTTTGTGTGT 
TAGCTATTTC 
TGAAGTCAGA 
ATCCCAGGAC 
CATGCCCACA 
GCTTTCATCA 
CTGGACAGAG 
CCGAGAGTGG 
TGATACCCCA 



GTCTCTGAAG 
GATACCATGC 
TTAGAACAGC 
GCCCTTGAGG 
CAGAAAAGTA 
GAGGTGTCTG 
CCTTCTAAAT 
CAGAATAGAA 
CAGCTGGAAG 
CTAGAGGGAA 
GATCCTTCTG 
TCTGCATTGA 
CATACTACTG 
GAATTGACAG 
ACCCCAGAAG 
AATCTAATTA 
GAACGGACAC 
TGGGTGACCC 
GGAGATGTGG 
AGAAAGATCT 
GATCAACTGG 
TTCACCCTTG 
GACAATGGCT 
GTGTTGGACA 
GATCCCCCAC 
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Met Asp Leu ser Ala Leu Arg Val Olu Glu Val Gin Asn Val lie Asn 

i 5 

Ala Met Gin Lys He Leu Glu Cys Pro He 



20 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH ; 38 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
I 5 

Ala Met Gin Lys lie Leu Glu Cys Pro He Cys Leu Glu Leu lie Lys 
20 25 

Glu Pro Val Ser Thr Val 
35 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 

Ala Met Gin Lys lie Leu Glu Cys Pro He Cys Leu Glu Leu lie Lys 
20 25 JU 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 



35 40 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu 
50 55 60 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
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Met Asp 



Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 



10 



Ala Met Gin Lys He Leu Glu cys Pro lie Cys Leu Glu Leu Xle Lys 

25 3° 



20 



Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 



35 



40 



Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Gly Pro Leu Cys 



50 



55 



Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 



65 



70 



Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 



85 



90 



Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 110 

Glu His Leu Lys Asp Glu Val Ser He lie Gin Ser Met 



Asn Ser Pro 
115 



120 



125 



Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 



130 



135 



Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
!45 150 155 1DU 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg He Gin Pro Gin Lys Thr 



165 



170 



Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 

180 185 
Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 

195 200 205 

Pro Gin Gly Thr Arg Asp Glu He Ser Leu Asp Ser Ala Lys Lys Ala 



210 



215 



Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 



225 



230 



Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 ^50 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 

275 280 285 

ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 

290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 



305 



310 



Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 
325 330 



52 



Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 
340 345 

Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
355 360 365 

Asp Val Pro Trp lie Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 
385 390 395 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 410 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys lie Asp Leu Leu 
420 425 

Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
435 440 443 

Ser Lys Ser Val Glu Ser Asn lie Glu Asp Lys lie Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 

Leu lie He Gly Ala Phe Val Thr Glu Pro Gin lie He Gin Glu Arg 
485 490 495 



Pro Leu 



Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 



500 505 



His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

Pro Glu Met lie Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 so ° 

Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 
565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He Ser 
580 585 35 * u 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 
595 600 605 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 
610 615 620 

Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 

lie Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 
645 650 "= 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 570 
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Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 



675 



680 



Ser 



Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 



690 €95 
Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 



705 



710 



Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 



725 



730 



Thr 



Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 



740 



745 



Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 

lie Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 

770 775 ™ 0 

Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn 
785 790 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 

805 810 
Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 

□ 2 D 



820 



Leu 



Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 



835 



840 



Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser 

850 855 860 

Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 



870 



875 



Glu Cys Ala 



Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 



885 



890 



Pro Lys 



Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 



900 



905 



Asn Glu Ser Asn lie Lys Pro Val Gin Thr Val Asn lie Thr Ala Gly 
915 920 

Val val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 



Phe Pro 
930 



935 



940 



Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 

Asn Glu Thr Gly Leu lie Thr Pro Asn Lys His Gly Leu Leu Gin Asn 

965 970 
Pro Tyr Arg lie Pro Pro Leu Phe Pro lie Lys Ser Phe Val Lys Thr 



980 



985 



Lys Cys Lys Lys Asn Leu Leu Glu^Glu Asn Phe Glu Olutt. Ser Met 



995 



1000 
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Ser Pro Glu Arg Glu Met Gly Asn Glu Asn He Pro Ser Thr Val Ser 
1010 1015 1020 

Thr He Ser Arg Asn Asn He Arg Glu Asn val Phe Lys Glu Ala Ser 
5 1025 1030 1035 1040 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
1045 1050 1055 

10 Ser He Asn Glu He Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 

1060 1065 1070 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 HOO 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
20 1105 mo 1H5 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu He Ser Asp Asn Leu Glu Gin Pro 
1125 1130 H35 

25 Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 

1140 H45 1150 



15 



30 



Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 1160 1165 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 
1170 H75 H80 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
35 11B5 1190 H95 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 
1205 1210 1215 

40 Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 

1220 1225 1230 



45 



60 



Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 



Asn Ser Leu Asn Asp Cys Ser Asn Gin Val He Leu Ala Lys Ala Ser 
50 1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 
1285 1290 1295 

55 ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 

1300 1305 1310 



Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 

1330 1335 1340 
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Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Glu Thr 
1365 1370 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 

1380 1385 
Thr Thr Gin Gin Arg Asp Thr mtOln His Asn Leu Il^Lys Leu Gin 
1 3 9 5 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 

1410 * 415 1420 

Pro Ser Asn Ser Tyr Pro Ser lie lie Ser Asp Ser Ser Ala Leu GlU Q 
1425 1430 

o-.. T hr- Ser Glu Lys Ala Val Leu Thr 
Asp Leu Arg Asn Pro Glu Gin Ser Thr ser wu uy* 

1445 1450 A1JJ 

ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu 

1460 1465 
Ser Ala As^Lys Phe Glu Val SjrAl. Asp Ser Ser ThrST Lys Asn 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490 "95 I 500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arj 
1505 1510 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 
1525 1530 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly lie 
1555 1560 1565 

ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 

1570 1575 158° 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala LeU Q 
1585 1590 I" 3 

Lys val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 
1605 1610 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 

1635 1640 
Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 

1650 1655 
v.l Tyr Lys Phe Ala ArgLy. His His lie Th^Leu Thr Asn Leu il^ 
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Thr Glu Glu Thr Thr His val Val Met Lys Thr Asp Ala Glu Phe val 
1585 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly lie Ala Gly Gly Lys Trp 
5 1700 1705 1710 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser He Lys Glu Arg Lys Met 
1715 1720 1725 

10 Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 

1730 1735 1740 



15 



30 



60 



Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys He 
1745 1750 1755 1760 

Phe Arg Gly Leu Glu He Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 
1765 1770 1775 



Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
20 1780 1785 1790 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro He val 
1795 1800 1805 

25 Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala He 

1810 1815 1820 



Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 1840 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu He Pro 
1845 1850 1855 



Gin He Pro His Ser His Tyr 
35 I860 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

45 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val He Asn 
50 1 5 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro He Cys Leu Glu Leu He Lys 
20 25 30 

55 Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 

35 40 45 



Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 

50 55 60 

Lys Asn Asp He Thr Lys Ser Val Leu Lys Arg Leu He He Thr Cys 
65 70 75 80 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 312 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Leu Ser Ala Leu Arg Val Giu Glu Val Gin Asn Val lie Asn- 
15 10 15 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
20 35 40 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 



10 



15 



30 



45 



60 



25 Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 



50 55 

Asn Asp He Thr Ly 
65 70 

Glu Le 

85 9° 

Tyr Ala Asn Ser Tyr 
100 105 



Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 



Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 

35 H5 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 

40 Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 

145 150 15b 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg He Gin Pro Gin Lys Thr 



165 "0 

Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 
50 195 200 205 



Pro 



Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 



210 215 



55 Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 

225 230 235 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 250 " 3 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 
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25 



30 



Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
275 280 2 " 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
5 290 295 300 

Cys Asn Lys Ser Lys Arg Leu Ala 
305 310 

10 (2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 765 amino acids 

(B) TYPE: amino acid 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu He Lys 

20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
. 50 55 60 

Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
35 65 70 75 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 

40 Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 

100 1° 5 11U 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser He lie Gin Ser Met 
115 120 125 

45 Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 

130 135 "0 

Pro Ser Leu Gin Glu Thr Ser Leu Ser val Gin Leu Ser Asn Leu Gly 
50 145 150 1" 160 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 
165 170 

55 Ser Val Tyr He Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 

180 185 X9 ° 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu He Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 
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Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 
225 230 235 ^ 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
5 245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 

1Q Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 

275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

15 Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 

305 310 315 320 

Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 
20 325 330 335 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 
340 345 350 

25 Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 

355 360 36b 

Asp Val Pro Trp lie Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 
385 390 395 400 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
35 405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys He Asp Leu Leu 
420 425 430 

40 Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 

435 440 445 

Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys He Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 
465 470 475 48° 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 
50 485 490 495 

Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

55 His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala val Gin Lys Thr 

515 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

60 

Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 
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Ser lie Gin Asn Glu Lys Asn Pro Asn Pro lie Glu Ser Leu Glu Lys 

570 



15 



30 



60 



565 



Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser lie Ser 
5 580 585 39U 

Asn Met Glu Leu Glu Leu Asn lie His Asn Ser Lys Ala Pro Lys Lys 
595 600 tai 

10 Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 

610 615 
Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 63° 6 

lie Asp Ser Cys Ser Ser Ser Glu Glu II. Lys Lys Lys Lys Tyr Asn 

645 ^ 
Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
20 660 665 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 680 
25 ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 

690 695 
Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 



705 710 



Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 



725 



Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
35 740 745 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu 
755 "760 

40 (2) INFORMATION FOR SEQ ID NO: 20: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 900 amino acids 

(B) TYPE: amino acid 

45 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

50 <xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 15 

55 Ala Met Gin Lys He Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 

20 25 



Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 
35 

Lys Leu Leu Asn Gin Lys 
50 55 



35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 



65 



70 



Gin Leu val Glu Glu Leu Leu Lys He lie Cys Ala Phe Gin Leu Asp 



85 



90 



Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 



Asn Ser Pro 
115 



100 105 HO 

Glu His Leu' Lys Asp Glu Val Ser He lie Gin Ser Met 



120 



125 



Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 



130 



Pro Ser Leu 



135 



Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 



145 



150 



155 



160 



Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 
165 I' 0 

Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 



180 



185 



Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 



195 200 

Thr Arg Asp 
210 215 



Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 
Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 



225 



Pro Ser 



230 



Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 



245 



250 



His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 

260 265 270 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 

" rt ft 2 8 o 



275 



280 



Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 



305 



310 



Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 



Trp Ala Gly Ser Lys Glu Thr Cys 

325 330 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 



340 



345 



Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 



355 



360 



Asp Val Pro Trp He Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 

375 380 



370 



Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 



385 



390 
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Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys lie Asp Leu Leu 
5 420 425 430 

Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
435 440 445 

10 Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys He Phe Gly Lys Thr 

450 455 460 



15 



30 



45 



60 



Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 

465 " " 470 475 480 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 

485 490 495 



Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
20 500 505 510 

His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

25 Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 

530 535 540 



Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 

Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 
565 570 575 



Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He Ser 

35 580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 
595 600 605 

40 Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His He His Ala Leu Glu 

610 615 620 



Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 655 



Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
50 660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 

675 680 685 

55 Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 

690 695 700 



Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 

705 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 

725 730 735 
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Thr VI Lys VI Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 

740 745 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 

7fifl '03 



755 760 



lie Ser Leu 
770 



Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser He Ser 



775 780 



805 810 
Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 



Leu Leu Glu VI Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 790 795 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 

Asp Thr Glu Gly Phe Lys 
920 825 830 

Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 
835 840 845 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser 
850 855 860 

Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 870 875 

Phe 
885 



Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Thr Ly. Ser 



Lys Ser His Phe 
900 

(2) INFORMATION FOR SEQ ID NO: 21; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 914 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1*5 10 

Ala Met Gin Lys He Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 

20 25 
Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 



35 



Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 



50 55 60 



Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 * D 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 HO 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser He He Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 160 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg He Gin Pro Gin Lys Thr 
165 170 175 

Ser Val Tyr He Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 190 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 

Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 
225 230 235 240 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 
305 * 310 315 320 

Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 
325 330 335 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 
340 345 350 

Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
355 360 365 

Asp Val Pro Trp He Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 
385 390 395 400 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys He Asp Leu Leu 
420 425 430 
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Ala Ser Asp Pro His Glu Ala Leu lie Cys Lys Ser Glu Arg Val His 

435 440 445 

Ser Lys Ser Val Glu Ser Asn lie Glu Asp Lys lie Phe Gly Lys Thr 
5 450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 

10 Leu He He Gly Ala Phe Val Thr Glu Pro Gin lie He Gin Glu Arg 

485 490 495 



15 



Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu- 
500 505 510 

His Pro Glu Asp Phe lie Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
5!5 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
20 530 535 540 

Val Met Asn lie Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 



25 



30 



35 6io 



60 



Ser lie Gin Asn Glu Lys Asn Pro Asn Pro lie Glu Ser Leu Glu Lys 
565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He Ser 
580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 
595 600 60S 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 



615 620 



Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

40 He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 «=> 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

4 ' J Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 

675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
50 690 695 700 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 710 715 720 

55 Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 

725 730 



Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 7 45 750 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 
755 760 7 65 
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lie Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 

770 775 
Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 7 9° 795 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 
805 810 

Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 

820 825 
Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 

835 840 
Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser 



850 



855 860 



Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 



865 8 ™ 



Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 
885 890 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 
900 905 

Asn Glu 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1202 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 



35 



40 45 



Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 



50 55 



Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 * 3 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 



67 
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Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 



130 



135 



Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 

10 Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 

165 170 

ser val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 

Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 

20 210 



215 220 



Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 
225 230 235 

25 Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 

245 250 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His val Glu 
260 265 2/0 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
35 290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 
305 310 

40 Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 

325 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 



340 345 



Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 



355 360 



Asp Val Pro Trp He Thr Leu Asn Ser Ser lie Gin Lys Val Asn Glu 
50 370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 



385 390 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 410 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys lie Asp Leu Leu 
420 425 

Ala Ser Asp Pro His Glu Ala Leu lie Cys Lys Ser Glu Arg Val His 
435 440 4 * D 
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Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys lie Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



465 



470 



475 



Leu lie lie Gly Ala Phe Val Thr Glu Pro Gin lie He Gin Glu Arg 
485 490 495 

Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 

Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 
565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He Ser 
580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 
595 600 60S 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 
610 615 620 

Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 
645 650 655 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 ~ 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 ' 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 
725 730 735 

Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 745 750 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 
755 760 765 

He Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser He Ser 
770 775 780 
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Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 

790 "95 



785 

cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 
3 805 810 

Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 
820 825 

10 Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 

835 840 84 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser. 
850 



855 860 



Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 
885 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 
900 905 

Asn Glu Ser Asn lie Lys Pro Val Gin Thr Val Asn lie Thr Ala Gly 
91 5 920 

Phe Pro val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 
930 935 940 

Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 



Asn Glu Thr Gly Leu He Thr Pro Asn Lys His Gly Leu Leu Gin Asn 



i 

35 965 



Pro Tyr Arg lie Pro Pro Leu Phe Pro lie Lys Ser Phe Val Lys Thr 
980 985 99° 

Lys Cys Lys Lys Asn Leu Leu OluOlu Asn Phe Glu Gl^His Ser Met 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn He Pro Ser Thr Val Ser 

1010 1015 1020 

Thr He Ser Arg Asn Asn lie Arg Glu Asn Val Phe Lys Glu Ala Ser 
1025 1030 1035 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
5 0 1045 1050 1° 55 

Ser He Asn Glu He Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 
1060 1065 1° 70 

55 Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 

1075 1° 80 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 11°° 



His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1U0 1115 1 



70 



PCT/US96/05621 

WO 96/33271 

Asn Thr Asp Phe Ser Pro Tyr Leu lie Ser Asp Asn Leu Glu Gin Pro 
1125 H30 H35 

Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 H45 H50 

Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 H60 H65 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 
1170 H75 H80 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 H90 1195 1200 

Gly Tyr 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1363 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val He Asn 
1 5 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro He Cys Leu Glu Leu He Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Val Glu Glu Leu Leu Lys He He Cys Ala Phe Gin Leu Asp 
85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 HO 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser He He Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 ^ 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 160 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg He Gin Pro Gin Lys Thr 
165 170 175 
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Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
1 8 0 ^" ^ 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 
5 195 200 205 

Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 

10 Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 

225 230 235 240 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 250 255 

15 His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 

260 265 270 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
20 275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

25 Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 

305 310 

Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 
325 330 

30 Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 

340 345 

Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
35 355 360 365 

Asp Val Pro Trp He Thr Leu Asn Ser Ser lie Gin Lys Val Asn Glu 
370 " 37 5 380 

40 Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 

385 390 395 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 *1° 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys lie Asp Leu Leu 

420 425 
Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
50 435 440 

Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys lie Phe Gly Lys Thr 
450 4 55 460 

55 Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 

465 470 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 
485 490 



45 



60 



Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 
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His Pro Glu Asp Phe lie Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 



15 



30 



45 



60 



515 



520 



Pro Glu Met lie Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
5 530 535 

Val Met Asn lie Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 

545 550 555 

10 Ser lie Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 



Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser lie Ser 



580 



585 



Asn Met Glu Leu Glu Leu Asn XI. His Asn Ser Lys Ala Pro Lys Lys 



595 



600 



Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 
20 610 615 



Leu val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 63° 635 

25 He Asp ser Cys Ser Ser Ser Glu Glu II. Lys Lys Lys Lys Tyr Asn 



525 630 635 

Asp Ser Cys Ser Ser Ser Glu Glu 

645 650 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Me, Glu Gly Lys 



660 



Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 



675 



680 



Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 

35 690 695 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 



710 715 



40 Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 



Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 

745 - ,u 



740 



Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 



lie Ser Leu 
50 770 



Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 



775 



780 



Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 790 
55 cys val ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 

805 aiU 

Gly cys ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 
820 825 



Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 
835 840 84S 



73 



WO 96/33271 



PCTAJS96/05621 



30 



Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Scr 
850 855 860 

Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 
5 865 870 875 880 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 
885 890 895 

10 Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 

900 905 910 

Asn Glu Ser Asn lie Lys Pro Val Gin Thr Val Asn lie Thr Ala Gly - 
91 5 920 925 

15 Phe Pro Val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 

930 935 940 

Ser He Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
20 945 950 955 960 

Asn Glu Thr Gly Leu lie Thr Pro Asn Lys His Gly Leu Leu Gin Asn 
965 970 9/5 

25 Pro Tyr Arg lie Pro Pro Leu Phe Pro lie Lys Ser Phe Val Lys Thr 

980 985 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn He Pro Ser Thr Val Ser 
1010 1015 1° 20 

Thr He Ser Arg Asn Asn He Arg Glu Asn Val Phe Lys Glu Ala Ser 
35 1025 1030 1035 104° 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
1045 1050 lUbD 

40 Ser He Asn Glu He Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 

1060 1065 1° 70 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
3.075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 HOD 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
50 H05 mo 1115 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu He Ser Asp Asn Leu Glu Gin Pro 
1125 113° x±i 

55 Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 

1140 I 145 11=° 

Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
115 5 1160 1165 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 
1170 H75 H80 



45 



60 
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Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 1190 1195 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 
5 " 1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
1220 1225 1230 

10 Lys Val Asn Asn He Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 

1235 1240 1245 



15 



Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val He Leu Ala Lys Ala Ser 
1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 
20 1285 1290 1295 

Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

25 Gin Asp Pro Phe Leu He Gly Ser Ser Lys Gin Met Arg His Gin Ser 

1315 1320 1325 



30 



35 



60 



Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Lys Lys Ser Lys Ala Trp 
1345 1350 1355 1360 

He Gin Thr 



(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 
40 <A) LENGTH: 1852 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

45 (ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val He Asn 
50 1 5 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro He Cys Leu Glu Leu He Lys 
20 25 30 

55 Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 

35 40 45 



Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 
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35 



40 



45 



50 



55 



60 



Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 



100 



105 



Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 

130 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 



145 



150 



155 



Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 



165 



170 



Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 



180 



185 



Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu He Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 

Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 



225 



230 



235 



Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 



305 



310 



320 



Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 



325 



330 



Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 
340 345 

Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
355 360 365 

Asp Val Pro Trp lie Thr Leu Asn Ser Ser lie Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 



385 



390 



395 



Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 
405 410 415 
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Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys He Asp Leu Leu 
420 425 430 

Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
5 435 440 445 

Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys He Phe Gly Lys Thr 
450 455 460 

10 Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 

465 470 475 4H0 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 
485 490 495 

Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
20 515 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

25 Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 

545 550 555 560 



15 



30 



45 



60 



Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 
565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He Ser 
580 58S 590 



Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 

35 595 600 605 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His He His Ala Leu Glu 
610 " 615 620 

40 Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 

625 630 635 640 



He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 655 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 



Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
50 675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 

55 Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 

705 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 
725 730 735 

Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 745 750 



77 



Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser. 
755 760 755 

lie Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 
770 780 

Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 790 795 »«u 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 
805 810 815 

Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro- 
820 825 83 

Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 
835 840 845 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser 

860 



850 355 



Lys Arg Gin Ser Phe Ala Pro Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 870 875 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 
885 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 

Asn Glu Ser Asn lie Lys Pro Val Gin Thr Val Asn lie Thr Ala Gly 
915 920 

Phe Pro Val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 
930 935 94° 

Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 

Asn Glu Thr Gly Leu lie Thr Pro Asn Lys His Gly Leu Leu Gin Asn 
965 970 

Pro Tyr Arg lie Pro Pro Leu Phe Pro He Lys Ser Phe Val Lys Thr 

980 985 99 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 



995 



1000 



Ser Pro Glu Arg Glu Met Gly Asn Glu Asn lie Pro Ser Thr Val Ser 
1010 1015 1020 

Thr lie Ser Arg Asn Asn He Arg Glu Asn Val Phe Lys Glu Ala Ser 
1025 1030 1035 1040 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
1045 1° 50 

Ser lie Asn Glu He Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 
1060 1° 65 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 
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Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro GlySer Asn Cys Lys 

1090 1095 I" 0 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 

5 lios mo 1115 

AS n Thr Asp Phe Ser Pro Tyr Leu lie Ser Asp Asn Leu Glu Gin Pro 
1125 II 30 

10 Met Gly Ser Ser His Ala Ser Gin yal Cys Ser Glu Thr Pro Asp Asp 

1140 H 45 lxw 

Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 

1155 1160 
Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 

1170 II 75 1180 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
20 1185 H90 1195 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 
1205 121° 

25 Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 

1220 1225 A " w 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1240 l-* 43 



15 



30 



60 



1235 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 

1250 1255 
Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
35 1265 1270 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 
1285 1290 

40 Ser Ser Gin cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 

1300 I 305 

Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 lJ2o 

45 Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 

1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
50 1345 "SO 1355 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Gl^Thr 



1365 



55 Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 

1380 1385 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1295 1400 lau:j 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 
1410 "IS 1420 



79 



Pro Ser Asn Ser Tyr Pro Ser lie lie Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 

Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 
1445 1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 
14 90 1495 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 
1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly He 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
1570 1575 1580 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 1600 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 
1605 1 6 1° 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 1660 

Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu lie 
1665 1670 1675 "BU 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 
1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly lie Ala Gly Gly Lys Trp 
1700 1705 1710 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser He Lys Glu Arg Lys Met 
1715 1720 1725 

Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
I745 1750 1755 1760 
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Phe Arg Gly Leu Glu lie Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 
1765 1770 l' 75 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
1780 1785 1790 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro lie Val 
179S 1800 1805 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala He 
1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 1840 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr 
1845 1850 
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WHAT IS CLAIMED IS: 

1 . An isolated nucleic acid comprising BRCA1 allele #5803 (SEQ ID NO: 1 ), 9601 (SEQ 
ED NO:2), 9815 (SEQ ID NO:3), 8403 (SEQ ID NO:4), 8203 (SEQ ID NO:5), 388 
(SEQUENCE ID NO:6), 6401 (SEQ ID NO:7), 4406 (SEQ ID NO:8), 10201 (SEQ ID 
NO:9), 7408 (SEQ ID NO: 10), 582 (SEQ ID NO: 1 1) or 77 (SEQ ID NO: 1 2), or a fragment 
thereof, wherein said fragment is capable of specifically hybridizing with said allele in the 
presence of wild-type BRC A 1 under stringent conditions. 

2. An isolated translation product of BRCA1 allele #5803 (SEQ ID NO: 13), 9601 (SEQ 
ID NO: 14), 9815 (SEQ ID NO: 15), 8203 (SEQ ID NO:17), 388 (SEQ ID NO: 18), 6401 
(SEQ ID NO: 19), 4406 (SEQ ID NO:20), 10201 (SEQ ID NO:21), 7408 (SEQ ID NO:22), 
582 (SEQ ID NO:23) or 77 (SEQ ED NO:24), or a C-terminus fragment thereof, or #8403 
(SEQ ID NO:16), or a fragment thereof comprising Gly at position 61. 

3 . A method of diagnosing a patient for a cancer susceptibility, said method comprising 
the steps of: 

isolating from said patient a first nucleic acid comprising at least one BRCA1 allele or 
fragment thereof; 

contacting said sample with a second nucleic acid according to claim 1 under 
conditions whereby said second nucleic acid is capable of specifically hybridizing with said first 
nucleic acid; 

detecting the presence or absence of specific hybridization of said second nucleic acid 
with said first nucleic acid; 

wherein the presence of specific hybridization of said second nucleic acid to said first 
nucleic acid is diagnostic of a cancer susceptibility. 

4. A method of diagnosing a patient for a cancer susceptibility, said method comprising 
the steps of: 

isolating from said patient a composition comprising a first translation product of at 

least one BRCA1 allele; 

contacting said first translation product with a reagent specific for a protein or O 
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terminal fragment thereof according to claim 2 under conditions wherein said reagent is 
capable of specifically binding said second translation product; 

detecting the presence or absence of specifically bound complexes of said reagent and 
said first translation product; 

wherein the presence of said complexes correlates with a cancer susceptibility. 
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